Closed LeeroyJenkinsss closed 2 years ago
The App now uses a defined set of allowed characters per alphabet
You can replace the characters as follows: | Old | New |
---|---|---|
” (RIGHT DOUBLE QUOTATION MARK) | "(QUOTATION MARK) | |
(SOFT HYPHEN) | -(HYPHEN-MINUS) | |
’ (RIGHT SINGLE QUOTATION MARK) | '(APOSTROPHE) | |
“ (LEFT DOUBLE QUOTATION MARK) | "(QUOTATION MARK) | |
— (EM DASH) | -(HYPHEN-MINUS) |
I assume you just replace the text in the dataset folder right? I edited my text using replace and it's still giving me the error. I searched the text for the Old marks 3 times. Granted I converted the text from a pdf.
Some of these characters can be invisible or hard to notice you can try to use notepad++ regex search and replace for the unicode characters:
Use this expression( where NNNN needs to bee replaced by the unicode.
\x{NNNN}
So for your cases these should work: |
Character | Search |
---|---|---|
” (RIGHT DOUBLE QUOTATION MARK) | \x{201D} | |
(SOFT HYPHEN) | \x{00AD} | |
’ (RIGHT SINGLE QUOTATION MARK) | \x{2019} | |
“ (LEFT DOUBLE QUOTATION MARK) | \x{201C} | |
— (EM DASH) | \x{2014} |
Make sure to test with search only first. If you do search and replace over an entire folde its near impossible to undo. If you encounter other characters you can use this handy site just paste the Unicode name from the message into the searchbar.
I opened the text in notepad++ and did the regex search and it didn't find anything in the text. I tried 1.0.4 app again and it still gives me the same error.
@LeeroyJenkinsss Please share your metadata.csv file
Looking at the metadata, notepad found the invalid characters, Ill fix them and see if it works first.
Yes the characters were in the metadata, I cleaned the file and the program started. Thanks!
I tried training with it and I got this: Invalid characters in text (for alphabet): ” (RIGHT DOUBLE QUOTATION MARK), (SOFT HYPHEN),’ (RIGHT SINGLE QUOTATION MARK),“ (LEFT DOUBLE QUOTATION MARK),— (EM DASH) and it refused to start. Does this mean the text shouldn't have punctuations anymore with this version? If I change to this version will I have to start all over?