SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
7.46k stars 919 forks source link

Finetune Gradio: Exceptions caused by wrong file path's #480

Closed mame82 closed 1 week ago

mame82 commented 1 week ago

Checks

Environment Details

All

Steps to Reproduce

Create a training dataset, where samples are stored as mp3 (instead of wav) ... metadata.csv also has to list files with mp3 extension.

While training goes well, testing the model with samples from the dataset leads to Exception. This is because the .wav extension is appended to the mp3, ending up as sample.mp3.wav (--> file not found exception).

Code responsible for the issue: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/train/finetune_gradio.py#L1180

✔️ Expected Behavior

If a random sample from the training dataset gets loaded in the test tab of the gradio finetuning interface, the file extension from the dataset should be used (instead of assuming .wav).

Note: The conversation to wav is done in the Transcription step, which is not required (if Metadata.csv and samples in another valid format exist)

Additional note (not worth a dedicated issue): While the pip installation sets a macro for f5-tts_finetune-gradio which could be called from every directory, it has to be invoked from the working directory of f5-tts. Otherwise, the script throws additional error when opening subprozesses with a path assumed to be relative to the F5-TTS directory, while it is relative to cwd when f5-tts_finetune-gradio is invoked (the path has to be build relative to __file__ not to the current working directory, like here: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/train/finetune_gradio.py#L47)

❌ Actual Behavior

Use the file extension of the audio files in the actual dataset, instead of assuming *.wav

SWivid commented 1 week ago

@lpscr might help

atlonxp commented 1 week ago

@mame82 error happened at this function, which is already fixed and merged just now -- try to git pull for the updates

def get_random_sample_transcribe(project_name):
    name_project = project_name
    path_project = os.path.join(path_data, name_project)
    file_metadata = os.path.join(path_project, "metadata.csv")
    if not os.path.isfile(file_metadata):
        return "", None

    data = ""
    with open(file_metadata, "r", encoding="utf-8-sig") as f:
        data = f.read()

    list_data = []
    for item in data.split("\n"):
        sp = item.split("|")
        if len(sp) != 2:
            continue

        # fixed audio when it is absolute
        file_audio = get_correct_audio_path(sp[0], path_project)
        list_data.append([file_audio, sp[1]])

    if list_data == []:
        return "", None

    random_item = random.choice(list_data)

    return random_item[1], random_item[0]
SWivid commented 1 week ago

will close this issue as fixed with 058b4461be3c80a11ae036ee87d4a6b06a9863e4

mame82 commented 1 week ago

Thx. Looks good codewise for the first issue (gonna pull later to test it).

I see no change addressing the second issue, which I put in the "additional" note (call to accelerate launch ... is relative to current working directory, instead of relative to F5-TTS repo directory).

In result training only works if f5-tts_finetune-gradio is invoked directly from the F5-TTS directory.

The link to responsible code is in my issue description. Here's a user complaint due to this behavior: https://github.com/SWivid/F5-TTS/discussions/143#discussioncomment-11224644

Sorry for not filing a PR myself, currently lacking the time to do so.

mame82 commented 1 week ago

@SWivid Off top question (asking upfront due to time constraints):

Would you accept a PR for train_cli which changes bnb_optimizer option from bool to str with choices ["none", "AdamW8", "AdamW32Paged", "AdamW8Paged"] to allow paged versions for further memory footprint reduction on low spec machines (would have to be adopted to finetune gradio, as it breaks backwards compatability)

Alternative would be an additional argument --bnb_optimizer True --bnb_mode AdamW8Paged" which defaults to AdamW8 if bnb enabled and torch.AdamW otherwise (a bit clunky, but backwards compatible).

mame82 commented 1 week ago

Won't fix!

The applied patch corrects the file extension, but the lookup path to the dataset sample is wrong now (searches random audio in data/{project name}/{sample}.mp3 when the sample path was data/{project name}/wavs/{sample}.mp3).

Additional info:

Folder structure for samples:

project_name
|-wavs
|   |-sample1.mp3
|   |-sample1.mp3
|-metadata.csv

metadata.csv (not holding sample path, but extension):

sample1.mp3|the transcription of sample1
sample2.mp3|the transcription of sample2

In contrast to the "Test Model" tab of the gradio interface the training has no issues with this data structure, thus it might be a good idea to fetch the path for random sample selection out of the training set from raw.arrow instead of doing manual magic to construct the path.

mame82 commented 1 week ago

will close this issue as fixed with 058b446

not fixed, see last comment

mame82 commented 1 week ago

Found the time to file a PR myself, thanks for the possible review and all your efforts. Would be happy if you also answer the "off-topic" question from above (accordint to "bnb_optimizer" paged modes).

Thx in advance

mame82 commented 1 week ago

Also file PR for bnb_optimizer_modes (keeps backward compatability of bnb_optimizer argument): https://github.com/SWivid/F5-TTS/pull/489