Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
516 stars 57 forks source link

WARNING:root:NaN or Inf found in input tensor. #17

Closed youngstu closed 1 year ago

youngstu commented 1 year ago

sh scripts/train.sh CodeTalker_s1 config/vocaset/stage1.yaml vocaset s1 sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

On the vocaset training, Nan appears in both the first and second stages.

[2023-04-21 18:27:21,120 INFO train_vq.py line 189 19610]=>Epoch: [1/200][60/314] Data: 0.027 (0.038) Batch: 0.076 (0.141) Remain: 02:27:53 Loss: 0.1405 
[2023-04-21 18:27:22,283 INFO train_vq.py line 189 19610]=>Epoch: [1/200][70/314] Data: 0.028 (0.037) Batch: 0.077 (0.138) Remain: 02:24:02 Loss: 0.1339 
[2023-04-21 18:27:23,436 INFO train_vq.py line 189 19610]=>Epoch: [1/200][80/314] Data: 0.025 (0.036) Batch: 0.070 (0.135) Remain: 02:21:09 Loss: 0.1392 
[2023-04-21 18:27:24,593 INFO train_vq.py line 189 19610]=>Epoch: [1/200][90/314] Data: 0.027 (0.035) Batch: 0.144 (0.133) Remain: 02:18:51 Loss: 0.1353 
[2023-04-21 18:27:25,681 INFO train_vq.py line 189 19610]=>Epoch: [1/200][100/314] Data: 0.025 (0.034) Batch: 0.068 (0.130) Remain: 02:16:20 Loss: 0.1325 
[2023-04-21 18:27:26,705 INFO train_vq.py line 189 19610]=>Epoch: [1/200][110/314] Data: 0.027 (0.033) Batch: 0.072 (0.128) Remain: 02:13:38 Loss: 0.1300 
[2023-04-21 18:27:27,809 INFO train_vq.py line 189 19610]=>Epoch: [1/200][120/314] Data: 0.027 (0.033) Batch: 0.075 (0.126) Remain: 02:12:05 Loss: 0.1290 
[2023-04-21 18:27:28,815 INFO train_vq.py line 189 19610]=>Epoch: [1/200][130/314] Data: 0.027 (0.032) Batch: 0.139 (0.124) Remain: 02:10:00 Loss: nan 
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
[2023-04-21 18:27:29,784 INFO train_vq.py line 189 19610]=>Epoch: [1/200][140/314] Data: 0.026 (0.032) Batch: 0.072 (0.122) Remain: 02:07:55 Loss: nan 
INFO:main-logger:Epoch: [1/200][140/314] Data: 0.026 (0.032) Batch: 0.072 (0.122) Remain: 02:07:55 Loss: nan 
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
Doubiiu commented 1 year ago

Sorry I am not sure about the reason for this... Did you use the same environment as what we mentioned?

youngstu commented 1 year ago

All have the same environmental configuration. Would it be convenient for you to pull and test the code again. Other friends have also encountered similar problems. Maybe there was a small problem with the code.

Very thanks.

youngstu commented 1 year ago

https://github.com/Doubiiu/CodeTalker/issues/10 It look like the same problem.

Doubiiu commented 1 year ago

OK I will pull and test the code again. Did you use the provided dataset preprocessing script? It should be the same as that in FaceFormer.

youngstu commented 1 year ago

Very Thanks.

Yes, I am using the preprocessing script provided from CodeTalker.

Faceformer training is correct.

youngstu commented 1 year ago

Found the cause of the problem, it's data_verts.npy bug. There is an issue with the data version of verts.npy(md5: 01bbeb78742afd15efe88ca318c7932d ). Early version training may have NAN issues, but the new data_verts.npy (md5: 2242f58f4697b046db7b3fd0cc7add59) version is right.

Very thanks.

Doubiiu commented 1 year ago

OK I will close this issue then