Open FFY0207 opened 4 months ago
This is my training log. Why did the loss and accuracy suddenly become very poor from the 21st cycle? How should I handle it
The first error sounds like some sort of hardware, driver, or pytorch error. It is probably unrelated to the code of this repository - maybe check your CUDA and pytorch installations. About the loss and accuracy suddenly getting worse, are you using the same batch size as the original code or is this with a smaller batch? A batch size that is too small is the most likely issue.
Why can the evaluation. py run normally with the transduction model. pt you provided, but the model I trained myself encountered the following error?can you help me?
It seems you loaded a false model, the output should be 80, whice matches the num_speech_features
This is my training log. Why did the loss and accuracy suddenly become very poor from the 21st cycle? How should I handle it
I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy
I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy
It's been quite a while so I don't really remember, but it's possible I may have manually adjusted the normalizers to scale down the size of the inputs or outputs. Sometimes larger values for inputs or outputs can make training less stable. You could try adjusting them and see if that helps. (Inputs seems more likely to help. You would want to increase the normalizer feature_stddevs values to decrease the feature scales. Multiplying by something like 2 or 5 seems reasonable. It might also help to compare the values in your normalizers file vs the one in the repository.)
I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy
It's been quite a while so I don't really remember, but it's possible I may have manually adjusted the normalizers to scale down the size of the inputs or outputs. Sometimes larger values for inputs or outputs can make training less stable. You could try adjusting them and see if that helps. (Inputs seems more likely to help. You would want to increase the normalizer feature_stddevs values to decrease the feature scales. Multiplying by something like 2 or 5 seems reasonable. It might also help to compare the values in your normalizers file vs the one in the repository.)
Thanks! I solved this problem by increasing the feature_stddevs of mel and abodon last batch in every epoch. By the way, can you share more detailsabout fine-tuning in vocoder? Such as all prediction mels are used in fine-tuning? I dont find that in this repository. At present, the sound I generate contains a lot of noise, which is very important for me as a beginner. Thank you again.
Epoch 1, Batch 3, Loss: 7.225614070892334 Train step: 2it [00:05, 2.95s/it] Traceback (most recent call last): File "/mnt/e/code/silent_speech/transduction_model.py", line 365, in
main()
File "/mnt/e/code/silent_speech/transduction_model.py", line 361, in main
model = train_model(trainset, devset, device, save_sound_outputs=save_sound_outputs)
File "/mnt/e/code/silent_speech/transduction_model.py", line 260, in train_model
loss.backward() # 反向传播
File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: unknown error
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.What problem did I encounter? I lowered the size of the batch, but it didn't work and the error still occurred