failed run the training with latest pull - Githubissues

deepfakes / faceswap-playground

User dedicated repo for the faceswap project

306 stars 194 forks source link

failed run the training with latest pull #227

Closed ruah1984 closed 5 years ago

ruah1984 commented 5 years ago

here is the error message

[[{{node model_1/conv2d_1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@train...propFilter"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/model_1/conv2d_1/convolution_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d_1/kernel/read)]] [[{{node loss/mul/_379}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1678_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

torzdf commented 5 years ago

Crash report please.

ruah1984 commented 5 years ago

Loading... 12/17/2018 19:36:54 INFO Log level set to: INFO Using TensorFlow backend. 12/17/2018 19:36:56 INFO Model A Directory: D:\DeepFaceLabTorrent_build_26_05_2018\DeepFaceLabTorrent\workspace\33\face 12/17/2018 19:36:56 INFO Model B Directory: D:\DeepFaceLabTorrent_build_26_05_2018\DeepFaceLabTorrent\workspace\dly\aligned 12/17/2018 19:36:56 INFO Training data directory: D:\models\New128\yf 12/17/2018 19:36:56 INFO Loading data, this may take a while... 12/17/2018 19:36:56 INFO Using live preview.\nPress 'ENTER' on the preview window to save and quit.\nPress 'S' on the preview window to save model weights immediately 2018-12-17 19:36:56.157724: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2018-12-17 19:36:56.356415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:01:00.0 totalMemory: 11.00GiB freeMemory: 9.10GiB 2018-12-17 19:36:56.356995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2018-12-17 19:36:57.147550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-12-17 19:36:57.147915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 12/17/2018 19:36:57 INFO Loading Model from Model_Originalhighres plugin...2018-12-17 19:36:57.148081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N

2018-12-17 19:36:57.148364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8788 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 12/17/2018 19:36:58 INFO loaded model weights 12/17/2018 19:36:58 INFO Loading Trainer from Model_Originalhighres plugin... 2018-12-17 19:37:11.643892: E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.0.5 but source was compiled with: 7.2.1. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2018-12-17 19:37:11.647565: E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.0.5 but source was compiled with: 7.2.1. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. Exception in thread Thread-2: Traceback (most recent call last): File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\threading.py", line 916, in _bootstrap_inner self.run() File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\threading.py", line 864, in run self._target(*self._args, *self._kwargs) File "C:\Users\ruah1\Downloads\faceswap-master (4)\faceswap-master\scripts\train.py", line 109, in process_thread raise err File "C:\Users\ruah1\Downloads\faceswap-master (4)\faceswap-master\scripts\train.py", line 101, in process_thread self.run_training_cycle(model, trainer) File "C:\Users\ruah1\Downloads\faceswap-master (4)\faceswap-master\scripts\train.py", line 139, in run_training_cycle trainer.train_one_step(iteration, viewer) File "C:\Users\ruah1\Downloads\faceswap-master (4)\faceswap-master\plugins\model\Model_OriginalHighRes\Trainer.py", line 38, in train_one_step loss_A = self.model.autoencoder_A.train_on_batch(warped_A, target_A) File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in call return self._call(inputs) File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(array_vals) File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1439, in call run_metadata_ptr) File "C:\Users\ruah1\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node model_1/conv2d_1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@train...propFilter"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/model_1/conv2d_1/convolution_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d_1/kernel/read)]] [[{{node loss/mul/_379}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1678_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

torzdf commented 5 years ago

In your faceswap folder the will be a file called crash_report.2018.xx.xx.xxxxxxx.log. Please provide that.

Thanks

torzdf commented 5 years ago

If it's not there. Upgrade your Tensorflow to 1.11+

torzdf commented 5 years ago

Also, your error is here:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.0.5 but source was compiled with: 7.2.1. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

Upgrade your cuDNN

ruah1984 commented 5 years ago

What are the CUDA and CUDNN version for this latest pull??

torzdf commented 5 years ago

https://github.com/deepfakes/faceswap/blob/master/INSTALL.md#cuda

ruah1984 commented 5 years ago

after update CUDNN, all solve