Closed Rakshasv18 closed 6 months ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
I have been training Coqui TTS models from past 10 months now trying different architectures on the datasets i had collected and finally i was able to generate deep fake's.
Here are my below findings also i am mentioning most of the issues i faced during training the models.
To Reproduce
Here are some of the issues that I faced, along with the solutions that helped me fix the bugs.
Solution: Ensure that TTS installation includes compatible versions of pytorch, torch audio, and Python (versions 2.0.0, 2.0.0, >=3.8, <=3.11). Refer to the compatibility matrix at https://pytorch.org/audio/stable/installation.html#compatibility-matrix for version compatibility.
Solution: Refer to the workaround provided at https://github.com/coqui-ai/TTS/issues/2075. Additionally, a YouTube link was used to resolve the error.
Ensure the audio has a sample rate of 22050 to avoid errors related to mismatched sampling rates. Use the Librosa library to convert audio samples to the required 22050 sample rate.
asseration error mismatch sampling rates ( requires = 22050 , our data= 48000)
error - File "/home/rakshav/tts/TTS/TTS/utils/audio/processor.py", line 683, in load_wav assert self.sample_rate == sr, "%s vs %s" % (self.sample_rate, sr) AssertionError: 22050 vs 48000
Spend sufficient time ensuring the correctness of audio data and transcripts. A column mismatch error occurred because the model requires 3 or more columns as input, but only 1 column was provided. Check and align the data format according to the model's requirements to avoid time-consuming issues.
I got an error : column mismatch as model takes 3 or more column as input we have only 1 column an exception handling is added in code will ignore only if its empty.
If you carefully see LJSpeech dataset metadata.csv the transcript of first column is copied in next two columns ( Separated by '|' )
Make sure to have the same type else you will end up spending alot of time on this.
5.Installation of additional packages.
Install espeakng, py-espeak-ng, and espeak packages if errors related to missing packages occur during installation.
The bug was identified in the code, and the fix involved modifying the logic regarding the minimum and maximum audio lengths
if length < min_len or length > max_len:
ignore_idx.append(idx) ) changed to (
if length > min_len and length < max_len: keep_idx.append(idx) )
variables : min_audio_len: int = 0, max_audio_len: int = float("inf")
bug raised - https://github.com/coqui-ai/TTS/issues/2942
OSError: FLAC conversion utility not available - consider installing the FLAC command line application by running
apt-get install flac
or your operating system's equivalentTraining ends without displaying errors or warnings :https://github.com/coqui-ai/TTS/discussions/1702
solution : If your training using GPU's try to get a notification whenever the training stops might help you in monitoring it better.
Make sure sys points to local tts else you will get an error : col[2] error
RuntimeError: [!] No samples left - Solution : Ensure that the dataset is not too small, as the TTS model requires sufficient data for training and evaluation. Understanding how the TTS model uses the data is crucial for resolving this issue.
Expected behavior
No response
Logs
No response
Environment
Additional context
Please feel free to reach out to me if any queries or questions regarding the above issues.