用cpu跑通demo了，但是用gpu跑不了，求救！！！

1.机器配置

操作系统：Windows 10 硬件配置环境

显卡：3070ti 8g
处理器：i9 7980xe
cuda 10.0
cudnn 7.6.5
内存 64g

软件依赖

pandas==0.24.2
regex==2019.4.14
h5py==2.9.0
numpy==1.16.2
tensorboard==1.13.1
tensorflow-gpu==1.13.1
tqdm==4.31.1
requests==2.22.0
protobuf==3.19.0

2.报错、解决思路、替代方案

模型加载好啦！🍺Bilibili干杯🍺 

现在将你的作文题精简为一个句子，粘贴到这里:⬇️，然后回车

**********************************************作文题目**********************************************

苦练本手,方能妙手随成

**********************************************作文题目**********************************************

正在生成第  1  of  1 篇文章

......

EssayKiller正在飞速写作中，请稍后......

2022-11-27 19:19:37.206277: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2022-11-27 19:19:37.206746: E tensorflow/stream_executor/cuda/cuda_blas.cc:2301] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[24,11,64], b.shape=[24,11,64], m=11, n=11, k=64, batch_size=24
         [[{{node sample_sequence/newslm/layer00/MatMul}}]]
         [[sample_sequence/while/Identity/_1594]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[24,11,64], b.shape=[24,11,64], m=11, n=11, k=64, batch_size=24
         [[{{node sample_sequence/newslm/layer00/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:/ly/EssayKiller_V2-master/LanguageNetwork/GPT2/scripts/demo.py", line 220, in <module>
    p_for_topp: top_p[chunk_i]})
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[24,11,64], b.shape=[24,11,64], m=11, n=11, k=64, batch_size=24
         [[node sample_sequence/newslm/layer00/MatMul (defined at C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
         [[sample_sequence/while/Identity/_1594]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[24,11,64], b.shape=[24,11,64], m=11, n=11, k=64, batch_size=24
         [[node sample_sequence/newslm/layer00/MatMul (defined at C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'sample_sequence/newslm/layer00/MatMul':
  File "d:/ly/EssayKiller_V2-master/LanguageNetwork/GPT2/scripts/demo.py", line 188, in <module>
    do_topk=False)
  File "d:\ly\EssayKiller_V2-master\LanguageNetwork\GPT2\scripts\modeling.py", line 768, in sample
    do_topk=do_topk)
  File "d:\ly\EssayKiller_V2-master\LanguageNetwork\GPT2\scripts\modeling.py", line 740, in initialize_from_context
    batch_size=batch_size, p_for_topp=p_for_topp, cache=None, do_topk=do_topk)
  File "d:\ly\EssayKiller_V2-master\LanguageNetwork\GPT2\scripts\modeling.py", line 714, in sample_step
    cache=cache,
  File "d:\ly\EssayKiller_V2-master\LanguageNetwork\GPT2\scripts\modeling.py", line 499, in __init__
    cache=layer_cache,
  File "d:\ly\EssayKiller_V2-master\LanguageNetwork\GPT2\scripts\modeling.py", line 198, in attention_layer
    attention_scores = tf.matmul(query, key, transpose_b=True)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\util\dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 2716, in matmul
    return batch_mat_mul_fn(a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 1712, in batch_mat_mul_v2
    "BatchMatMulV2", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "C:\Users\ly1995\AppData\Local\conda\conda\envs\zuowen1\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

2.1 关键信息抽取

(0) Internal: Blas xGEMMBatched launch failed : a.shape=[24,11,64], b.shape=[24,11,64], m=11, n=11, k=64, batch_size=24

2.2 问题分析

通过bing搜索报错信息，得知了报错原因，主要是因为显存不够造成的

2.3 想法1

既然显存不够，那就减少一些显存，让程序灵活调用显存，这样问题就解决了吧，于是我加入了如下语句

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
tf_config = tf.compat.v1.ConfigProto(allow_soft_placement=True)
tf_config.gpu_options.allow_growth=True
# tf_config.gpu_options.per_process_gpu_memory_fraction = 0.6

...可还是报错，是因为显存不够吗...

替代方案1

既然gpu跑不了，那干脆不用gpu了，用cpu试试，于是我修改了以下语句

os.environ["CUDA_VISIBLE_DEVICES"] = " " #将0改为none

结果：程序跑通了，但是cpu跑肯定比gpu慢很多，跑一篇作文大概要10min，cpu占用率大概为40-50

Turing-Project / WriteGPT

cpu跑通了，但是时间很久，gpu还是没成功 #41