[Bug] Issues and solution used till now to generate deep fake.

Describe the bug

I have been training Coqui TTS models from past 10 months now trying different architectures on the datasets i had collected and finally i was able to generate deep fake's.

Here are my below findings also i am mentioning most of the issues i faced during training the models.

To Reproduce

Here are some of the issues that I faced, along with the solutions that helped me fix the bugs.

Installation Error while running TTS model.

Solution: Ensure that TTS installation includes compatible versions of pytorch, torch audio, and Python (versions 2.0.0, 2.0.0, >=3.8, <=3.11). Refer to the compatibility matrix at https://pytorch.org/audio/stable/installation.html#compatibility-matrix for version compatibility.

pyworld library installation error -

Solution: Refer to the workaround provided at https://github.com/coqui-ai/TTS/issues/2075. Additionally, a YouTube link was used to resolve the error.

Audio data issues.

Ensure the audio has a sample rate of 22050 to avoid errors related to mismatched sampling rates. Use the Librosa library to convert audio samples to the required 22050 sample rate.

asseration error mismatch sampling rates ( requires = 22050 , our data= 48000)

error - File "/home/rakshav/tts/TTS/TTS/utils/audio/processor.py", line 683, in load_wav assert self.sample_rate == sr, "%s vs %s" % (self.sample_rate, sr) AssertionError: 22050 vs 48000

Metadata format error

Spend sufficient time ensuring the correctness of audio data and transcripts. A column mismatch error occurred because the model requires 3 or more columns as input, but only 1 column was provided. Check and align the data format according to the model's requirements to avoid time-consuming issues.

I got an error : column mismatch as model takes 3 or more column as input we have only 1 column an exception handling is added in code will ignore only if its empty.

If you carefully see LJSpeech dataset metadata.csv the transcript of first column is copied in next two columns ( Separated by '|' )

Make sure to have the same type else you will end up spending alot of time on this.

5.Installation of additional packages.

Install espeakng, py-espeak-ng, and espeak packages if errors related to missing packages occur during installation.

The bug was identified in the code, and the fix involved modifying the logic regarding the minimum and maximum audio lengths

if length < min_len or length > max_len:

ignore_idx.append(idx) ) changed to (

if length > min_len and length < max_len: keep_idx.append(idx) )

variables : min_audio_len: int = 0, max_audio_len: int = float("inf")

bug raised - https://github.com/coqui-ai/TTS/issues/2942

OSError: FLAC conversion utility not available - consider installing the FLAC command line application by running apt-get install flac or your operating system's equivalent
Training ends without displaying errors or warnings :https://github.com/coqui-ai/TTS/discussions/1702

solution : If your training using GPU's try to get a notification whenever the training stops might help you in monitoring it better.

Make sure sys points to local tts else you will get an error : col[2] error
RuntimeError: [!] No samples left - Solution : Ensure that the dataset is not too small, as the TTS model requires sufficient data for training and evaluation. Understanding how the TTS model uses the data is crucial for resolving this issue.

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA A100-PCIE-40GB",
            "NVIDIA A100-PCIE-40GB",
            "NVIDIA A100-PCIE-40GB"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.0+cu117",
        "TTS": "0.16.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.0",
        "version": "#1 SMP Thu Nov 16 10:29:04 EST 2023"
    }
}

Additional context

Please feel free to reach out to me if any queries or questions regarding the above issues.

coqui-ai / TTS