coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.99k stars 4.13k forks source link

Question: Why is the model size different when trained using train_gpt_xtts.py in xtts_v2 compared to the baseline model? #3640

Closed MPQZF closed 6 months ago

MPQZF commented 6 months ago

Describe the bug

When I train the train_gpt_xtts.py in recipes/ljspeech/xtts_v2 folder, the trained model size is 5.3G, but the baseline model size is 1.8G. image image

To Reproduce

I only made the following modifications in the recipes/ljspeech/xtts_v2/train_gpt_xtts.py in order to be able to execute it: the "path" and "meta_file_train" in the BaseDatasetConfig input parameters and the path of the LJ001-0002.wav in "SPEAKER_REFERENCE". I think it has nothing to do with this.

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla T4"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.2",
        "TTS": "0.22.0",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.7",
        "version": "#60~20.04.1-Ubuntu SMP Thu Feb 22 15:49:52 UTC 2024"
    }
}

Additional context

No response