BenAAndrew / Voice-Cloning-App

A Python/Pytorch app for easily synthesising human voices
BSD 3-Clause "New" or "Revised" License
1.4k stars 233 forks source link

Invalid characters in text #93

Closed LeeroyJenkinsss closed 2 years ago

LeeroyJenkinsss commented 2 years ago

I tried training with it and I got this: Invalid characters in text (for alphabet): ” (RIGHT DOUBLE QUOTATION MARK),­ (SOFT HYPHEN),’ (RIGHT SINGLE QUOTATION MARK),“ (LEFT DOUBLE QUOTATION MARK),— (EM DASH) and it refused to start. Does this mean the text shouldn't have punctuations anymore with this version? If I change to this version will I have to start all over?

SirBitesalot commented 2 years ago

The App now uses a defined set of allowed characters per alphabet

You can replace the characters as follows: Old New
” (RIGHT DOUBLE QUOTATION MARK) "(QUOTATION MARK)
(SOFT HYPHEN) -(HYPHEN-MINUS)
’ (RIGHT SINGLE QUOTATION MARK) '(APOSTROPHE)
“ (LEFT DOUBLE QUOTATION MARK) "(QUOTATION MARK)
— (EM DASH) -(HYPHEN-MINUS)
LeeroyJenkinsss commented 2 years ago

I assume you just replace the text in the dataset folder right? I edited my text using replace and it's still giving me the error. I searched the text for the Old marks 3 times. Granted I converted the text from a pdf.

SirBitesalot commented 2 years ago
Some of these characters can be invisible or hard to notice you can try to use notepad++ regex search and replace for the unicode characters: Use this expression( where NNNN needs to bee replaced by the unicode. \x{NNNN} So for your cases these should work: Character Search
” (RIGHT DOUBLE QUOTATION MARK) \x{201D}
(SOFT HYPHEN) \x{00AD}
’ (RIGHT SINGLE QUOTATION MARK) \x{2019}
“ (LEFT DOUBLE QUOTATION MARK) \x{201C}
— (EM DASH) \x{2014}

Make sure to test with search only first. If you do search and replace over an entire folde its near impossible to undo. If you encounter other characters you can use this handy site just paste the Unicode name from the message into the searchbar.

LeeroyJenkinsss commented 2 years ago

I opened the text in notepad++ and did the regex search and it didn't find anything in the text. I tried 1.0.4 app again and it still gives me the same error.

BenAAndrew commented 2 years ago

@LeeroyJenkinsss Please share your metadata.csv file

LeeroyJenkinsss commented 2 years ago

Looking at the metadata, notepad found the invalid characters, Ill fix them and see if it works first.

LeeroyJenkinsss commented 2 years ago

Yes the characters were in the metadata, I cleaned the file and the program started. Thanks!