Bugs with German language

Schokodrache commented 2 years ago

Voice Cloning works flawlessly using the English language – really powerful tool, thanks a lot!

But there are several bugs for the German language.

-German.txt must be manually put in alphabets subfolder, otherwise an error message appears while building the dataset. -Matching the segments does not work properly, neither for the built-in version of German language, nor for a newly created language version with coqui files. Only two minutes of text are identified from a source file of 30 minutes (source quality is good). -German.txt contains umlauts and special characters, but these are missing / cut out in the final metadata.csv

The automation of segmenting is one of the most helpful features of your app, so it would be extremely helpful if this would also work for non-English languages. Thanks!

SirBitesalot commented 2 years ago

What error are you getting that prevents you from generating the dataset whith the included Alphabet file?
For the Matching segments. Are you using "combine clips" in advanced settings during dataset creation? The built-in version is pretty good in my experience. At least half of the source is detected most of times. So 2 min seems like there is some issue.
For me Umlauts work fine. Do you have an example text and audio where it doesnt work?

Schokodrache commented 2 years ago

I receive the following error using language "German": _Type: FileNotFoundError Text: [Errno 2] No such file or directory: 'alphabets\German.txt' Full: Traceback (most recent call last): File "flask\app.py", line 1950, in full_dispatch_request File "flask\app.py", line 1936, in dispatch_request File "application\views.py", line 121, in create_dataset_post File "application\views.py", line 87, in get_symbols File "training\utils.py", line 189, in loadsymbols FileNotFoundError: [Errno 2] No such file or directory: 'alphabets\German.txt'
I created this directory with the German.txt file from the source code and used the "combine clips" option. Building the dataset delivers Matched 458 segments Combining clips Produced 4 final clips
Here's the audio file: https://drive.google.com/file/d/1FIc8vdIBoE8aXneKrouYGJwnGj94qS_S/view?usp=sharing Original text file: Haushaltsgesetz_2022.txt Text file in dataset folder after process without umlauts: text.txt

Thanks a lot for looking into this problem!

SirBitesalot commented 2 years ago

I just ran a test and on my end I get: Size: 0 hours, 31 minutes Total clips: 255

Maybe something went wrong during setup? Normally it should not be required to add german language manually.

SirBitesalot commented 2 years ago

@Schokodrache As for the error with disappering umlauts. Please convert the file to utf-8 encoding and try again.

@BenAAndrew Maybe this needs to be added as some hint or something. If a source text contains special characters the file needs to be utf-8 otherwise the characters will just disapear.

BenAAndrew commented 2 years ago

@SirBitesalot Good idea. Please open a PR on the relevant page

Schokodrache commented 2 years ago

This definitely goes into the right direction, thanks! The text.txt file now includes the umlauts. Yet, still only a few segments are identified, see metadata. csv file. Very strange. metadata.csv What could go wrong during setup? The aphabets folder is present in a folder called User/Appdata/Local/temp/_MEI198762. This folder seems to be created when the exe is executed, but I ran the exe from another folder, and still alphabets\German.txt was not found.

Schokodrache commented 2 years ago

Meanwhile, I've built the app myself, and now everything works fine for the German language. So the original problem seems somehow to be related to the ready-to-use build. Thanks again for the support!

BenAAndrew commented 2 years ago

Did you modify the German alphabet file? The packaged app uses this version so it wouldn't have the changes you made to the source code

Schokodrache commented 2 years ago

No, I haven't changed anything in the alphabet file. Building the app from the source code solved all problems.

H4xl0r commented 2 years ago

Got the same problem when trying to use german , the files are there , even exchanging them with the current ones , wont help

BenAAndrew / Voice-Cloning-App

Bugs with German language #119