dgaddy / silent_speech

Code for voicing silent speech from EMG. Official repository for the papers "Digital Voicing of Silent Speech" at EMNLP 2020 and "An Improved Model for Voicing Silent Speech" at ACL 2021. Also includes code for converting silent speech to text.
MIT License
119 stars 52 forks source link

some question about train #8

Open FFY0207 opened 4 months ago

FFY0207 commented 4 months ago

Epoch 1, Batch 3, Loss: 7.225614070892334 Train step: 2it [00:05, 2.95s/it] Traceback (most recent call last): File "/mnt/e/code/silent_speech/transduction_model.py", line 365, in main() File "/mnt/e/code/silent_speech/transduction_model.py", line 361, in main model = train_model(trainset, devset, device, save_sound_outputs=save_sound_outputs) File "/mnt/e/code/silent_speech/transduction_model.py", line 260, in train_model loss.backward() # 反向传播 File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA error: unknown error Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What problem did I encounter? I lowered the size of the batch, but it didn't work and the error still occurred

FFY0207 commented 4 months ago

b4ef614d5a7e66324bc6e75384c4ed3

This is my training log. Why did the loss and accuracy suddenly become very poor from the 21st cycle? How should I handle it

dgaddy commented 4 months ago

The first error sounds like some sort of hardware, driver, or pytorch error. It is probably unrelated to the code of this repository - maybe check your CUDA and pytorch installations. About the loss and accuracy suddenly getting worse, are you using the same batch size as the original code or is this with a smaller batch? A batch size that is too small is the most likely issue.

FFY0207 commented 4 months ago

image Why can the evaluation. py run normally with the transduction model. pt you provided, but the model I trained myself encountered the following error?can you help me?

Gray-ly commented 3 months ago

It seems you loaded a false model, the output should be 80, whice matches the num_speech_features

Gray-ly commented 3 months ago

b4ef614d5a7e66324bc6e75384c4ed3

This is my training log. Why did the loss and accuracy suddenly become very poor from the 21st cycle? How should I handle it

I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy

dgaddy commented 2 months ago

I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy

It's been quite a while so I don't really remember, but it's possible I may have manually adjusted the normalizers to scale down the size of the inputs or outputs. Sometimes larger values for inputs or outputs can make training less stable. You could try adjusting them and see if that helps. (Inputs seems more likely to help. You would want to increase the normalizer feature_stddevs values to decrease the feature scales. Multiplying by something like 2 or 5 seems reasonable. It might also help to compare the values in your normalizers file vs the one in the repository.)

Gray-ly commented 1 month ago

I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy

It's been quite a while so I don't really remember, but it's possible I may have manually adjusted the normalizers to scale down the size of the inputs or outputs. Sometimes larger values for inputs or outputs can make training less stable. You could try adjusting them and see if that helps. (Inputs seems more likely to help. You would want to increase the normalizer feature_stddevs values to decrease the feature scales. Multiplying by something like 2 or 5 seems reasonable. It might also help to compare the values in your normalizers file vs the one in the repository.)

Thanks! I solved this problem by increasing the feature_stddevs of mel and abodon last batch in every epoch. By the way, can you share more detailsabout fine-tuning in vocoder? Such as all prediction mels are used in fine-tuning? I dont find that in this repository. At present, the sound I generate contains a lot of noise, which is very important for me as a beginner. Thank you again.