TachibanaYoshino / AnimeGANv2

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime
5.06k stars 709 forks source link

在nvidia A4000显卡上无法训练 #51

Open kawais opened 2 years ago

kawais commented 2 years ago

使用命令python train.py --dataset Hayao --epoch 101 --init_epoch 10 训练过程中提示错误 failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED。 这个要怎么解决?

软件版本:

packages in environment at C:\ProgramData\Anaconda3\envs\py36:

#

Name Version Build Channel

absl-py 1.0.0 pypi_0 pypi astor 0.8.1 pypi_0 pypi cached-property 1.5.2 pypi_0 pypi certifi 2021.5.30 py36haa95532_0 colorama 0.4.4 pypi_0 pypi cudatoolkit 10.0.130 0 cudnn 7.6.0 cuda10.0_0 dataclasses 0.8 pypi_0 pypi gast 0.2.2 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.44.0 pypi_0 pypi h5py 3.1.0 pypi_0 pypi importlib-metadata 4.8.3 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi markdown 3.3.6 pypi_0 pypi numpy 1.19.5 pypi_0 pypi opencv-python 4.5.5.62 pypi_0 pypi opt-einsum 3.3.0 pypi_0 pypi pip 21.2.2 py36haa95532_0 protobuf 3.19.4 pypi_0 pypi python 3.6.2 h09676a0_15 setuptools 58.0.4 py36haa95532_0 six 1.16.0 pypi_0 pypi tensorboard 1.15.0 pypi_0 pypi tensorflow-estimator 1.15.1 pypi_0 pypi tensorflow-gpu 1.15.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tqdm 4.62.3 pypi_0 pypi typing-extensions 4.1.1 pypi_0 pypi vc 14.2 h21ff451_1 vs2015_runtime 14.27.29016 h5e58377_2 werkzeug 2.0.3 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 wincertstore 0.2 py36h7fe50ca_0 wrapt 1.13.3 pypi_0 pypi zipp 3.6.0 pypi_0 pypi

错误日志: 2022-02-23 11:58:11.480753: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-02-23 11:58:11.483857: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-02-23 11:58:11.483905: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A7DCBCE8; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.486156: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C038; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.490415: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C058; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.488390: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A7DCBD08; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.493417: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! 2022-02-23 11:58:11.496159: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! 2022-02-23 11:58:11.497784: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C038; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.503459: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C058; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.505625: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! 2022-02-23 11:58:11.844846: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call return fn(*args) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn target_list, run_metadata) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3 [[{{node Tensordot/MatMul}}]] [[mul_10/_893]] (1) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3 [[{{node Tensordot/MatMul}}]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 101, in main() File "train.py", line 96, in main gan.train() File "E:\AnimeGANv2-master\AnimeGANv2.py", line 248, in train self.Generator_loss, self.G_loss_merge], feed_dict = train_feed_dict) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run run_metadata_ptr) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run run_metadata) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3 [[node Tensordot/MatMul (defined at C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]] [[mul_10/_893]] (1) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3 [[node Tensordot/MatMul (defined at C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'Tensordot/MatMul': File "train.py", line 101, in main() File "train.py", line 91, in main gan.build_model() File "E:\AnimeGANv2-master\AnimeGANv2.py", line 155, in build_model t_loss = self.con_weight c_loss + self.sty_weight s_loss + color_loss(self.real,self.generated) self.color_weight + tv_loss File "E:\AnimeGANv2-master\tools\ops.py", line 280, in color_loss con = rgb2yuv(con) File "E:\AnimeGANv2-master\tools\ops.py", line 309, in rgb2yuv return tf.image.rgb_to_yuv(rgb) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\image_ops_impl.py", line 2930, in rgb_to_yuv return math_ops.tensordot(images, kernel, axes=[[ndims - 1], [0]]) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 4071, in tensordot ab_matmul = matmul(a_reshape, b_reshape) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\util\dispatch.py", line 180, in wrapper return target(args, *kwargs) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 2754, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 6136, in mat_mul name=name) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func return func(args, **kwargs) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op attrs, op_def, compute_device) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal op_def=op_def) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in init self._traceback = tf_stack.extract_stack()