UAlbertaALTLab / recording-validation-interface

Maskwacîs recordings validation interface
https://speech-db.altlab.app/
Other
1 stars 1 forks source link

Fix: characters with special characters are secretly two characters #440

Open nienna73 opened 1 year ago

nienna73 commented 1 year ago

When importing pretty much all of Jean's recordings, any character with a circumflex or a macron gets stored as two characters: ˆ+ e instead of ê. There's a script in progress that's supposed to find the unicode character for ˆ, which is \xcc\x82 and replace that with the correct single character.

This script isn't working.

The only way I've done this successfully is by doing it manually.

The script is here: https://github.com/UAlbertaALTLab/recording-validation-interface/blob/production/validation/management/commands/lookforcombinedcharacters.py