Closed leondz closed 4 years ago
Hey thanks for the notice, there were frequent misspellings in the transcripts so we used a normalization script. Seems we also misspelled, but that misspelling should be consistent. I'll update the full data with the correct spelling.
If you are interested, here are all the Marisha misspellings I caught while going through the text:' As you can see, I copy pasted the incorrect spelling when I made this :(. The interesting part (linguistically) is all the variations of a single name!
'MAIRSHA': ['MARIHSA'], 'MAISHA': ['MARIHSA'], 'MARIASHA': ['MARIHSA'], 'MARIHSA': ['MARIHSA'], 'MARIHSHA': ['MARIHSA'], 'MARIRSHA': ['MARIHSA'], 'MARISA': ['MARIHSA'], 'MARISAH': ['MARIHSA'], 'MARISAHA': ['MARIHSA'], 'MARISH': ['MARIHSA'], 'MARISHA': ['MARIHSA'], 'MARISHA (through gritted teeth)': ['MARIHSA'], 'MARISHIA': ['MARIHSA'], 'MARISSA': ['MARIHSA'], 'MARSHA': ['MARIHSA'], 'MARSHIA': ['MARIHSA'], 'MARSIAH': ['MARIHSA'], 'MARSIHA': ['MARIHSA'], 'MARiSHA': ['MARIHSA'],
Name is changed! https://github.com/RevanthRameshkumar/CRD3/pull/4
The json utterer fields often contain errors for this speaker’s name.