Closed mame82 closed 1 week ago
@lpscr might help
@mame82 error happened at this function, which is already fixed and merged just now -- try to git pull for the updates
def get_random_sample_transcribe(project_name):
name_project = project_name
path_project = os.path.join(path_data, name_project)
file_metadata = os.path.join(path_project, "metadata.csv")
if not os.path.isfile(file_metadata):
return "", None
data = ""
with open(file_metadata, "r", encoding="utf-8-sig") as f:
data = f.read()
list_data = []
for item in data.split("\n"):
sp = item.split("|")
if len(sp) != 2:
continue
# fixed audio when it is absolute
file_audio = get_correct_audio_path(sp[0], path_project)
list_data.append([file_audio, sp[1]])
if list_data == []:
return "", None
random_item = random.choice(list_data)
return random_item[1], random_item[0]
will close this issue as fixed with 058b4461be3c80a11ae036ee87d4a6b06a9863e4
Thx. Looks good codewise for the first issue (gonna pull later to test it).
I see no change addressing the second issue, which I put in the "additional" note (call to accelerate launch ...
is relative to current working directory, instead of relative to F5-TTS
repo directory).
In result training only works if f5-tts_finetune-gradio
is invoked directly from the F5-TTS
directory.
The link to responsible code is in my issue description. Here's a user complaint due to this behavior: https://github.com/SWivid/F5-TTS/discussions/143#discussioncomment-11224644
Sorry for not filing a PR myself, currently lacking the time to do so.
@SWivid Off top question (asking upfront due to time constraints):
Would you accept a PR for train_cli which changes bnb_optimizer
option from bool
to str
with choices ["none", "AdamW8", "AdamW32Paged", "AdamW8Paged"]
to allow paged versions for further memory footprint reduction on low spec machines (would have to be adopted to finetune gradio, as it breaks backwards compatability)
Alternative would be an additional argument --bnb_optimizer True --bnb_mode AdamW8Paged"
which defaults to AdamW8
if bnb enabled and torch.AdamW otherwise (a bit clunky, but backwards compatible).
Won't fix!
The applied patch corrects the file extension, but the lookup path to the dataset sample is wrong now (searches random audio in data/{project name}/{sample}.mp3
when the sample path was data/{project name}/wavs/{sample}.mp3
).
Additional info:
Folder structure for samples:
project_name
|-wavs
| |-sample1.mp3
| |-sample1.mp3
|-metadata.csv
metadata.csv (not holding sample path, but extension):
sample1.mp3|the transcription of sample1
sample2.mp3|the transcription of sample2
In contrast to the "Test Model" tab of the gradio interface the training has no issues with this data structure, thus it might be a good idea to fetch the path for random sample selection out of the training set from raw.arrow
instead of doing manual magic to construct the path.
Found the time to file a PR myself, thanks for the possible review and all your efforts. Would be happy if you also answer the "off-topic" question from above (accordint to "bnb_optimizer" paged modes).
Thx in advance
Also file PR for bnb_optimizer_modes (keeps backward compatability of bnb_optimizer
argument): https://github.com/SWivid/F5-TTS/pull/489
Checks
Environment Details
All
Steps to Reproduce
Create a training dataset, where samples are stored as mp3 (instead of wav) ... metadata.csv also has to list files with mp3 extension.
While training goes well, testing the model with samples from the dataset leads to Exception. This is because the
.wav
extension is appended to the mp3, ending up assample.mp3.wav
(--> file not found exception).Code responsible for the issue: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/train/finetune_gradio.py#L1180
✔️ Expected Behavior
If a random sample from the training dataset gets loaded in the test tab of the gradio finetuning interface, the file extension from the dataset should be used (instead of assuming
.wav
).Note: The conversation to wav is done in the Transcription step, which is not required (if Metadata.csv and samples in another valid format exist)
Additional note (not worth a dedicated issue): While the pip installation sets a macro for
f5-tts_finetune-gradio
which could be called from every directory, it has to be invoked from the working directory of f5-tts. Otherwise, the script throws additional error when opening subprozesses with a path assumed to be relative to theF5-TTS
directory, while it is relative tocwd
whenf5-tts_finetune-gradio
is invoked (the path has to be build relative to__file__
not to the current working directory, like here: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/train/finetune_gradio.py#L47)❌ Actual Behavior
Use the file extension of the audio files in the actual dataset, instead of assuming
*.wav