Closed NataliaShmueli closed 1 year ago
Yeah, so Windows has a maximum path length of 260 (https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#maximum-path-length-limitation), so if you have nested common voice in some deep folder structure, then you'll hit this. You can move the directory to somewhere closer to the drive root (i.e. C:/common_voice_jp
) and it should work. I'll think about ways that MFA could get around it, but it is ultimately a windows issue.
For reference, the path I use for it is D:\Data\speech\model_training_corpora\japanese\common_voice_ja
Strangely enough, this has never been an issue for training/aligning, I don't think? I checked online for the length and it was only 181 characters at max.
K:\Training_Models\Spoken\Japanese\CommonVoice\cv\ja\1af9f4b197c3b75b95b91661651d490a1ce31d182b462702bc7613842a00146835a16b7d7d28c1e0e8e366c41216e786cf8c155fcbdcaab3f8f7d99b4a9c09fe
Adding one more thing, it's refusing to tokenize corpora with Japanese names. I had a dataset folder in Katakana, and renaming it to Romaji made it work. Not a major issue though!
Debugging checklist
[x ] Have you updated to latest MFA version? [x ] Have you tried rerunning the command with the
--clean
flag?Describe the issue A clear and concise description of what the bug is. The tokenizer failed on Japanese CommonVoice. When I tried it on even an individual speaker, it also failed. When I finally moved the test single speaker recordings to a folder that I named
JaTest
, it ended up working. This issue only happens with CommonVoice, so it might be related to the length of the folder name, of which was originallydbc3652a5a930b462947cfb0c88dd9ddb3ebe1c0cde73e7a020831c266f57ae464867e65ee452b1dbf2d034a39db03bab2773545ad809e2a2d209ed613492af8
For Reproducing your issue Please fill out the following:Log file Please attach the log file for the run that encountered an error (by default these will be stored in
~/Documents/MFA
). ja.logDesktop (please complete the following information):
Additional context Add any other context about the problem here. TL;DR might be an issue with the length or naming scheme of folders.