Closed cristan closed 6 months ago
Yeah kindly note this also for when we will start to review the new system requirements... This is probably a bug of the output script, we can fix it in this dataset by passing that column through some kind of filter, or do you want to keep it as this to allow for data quality checks?
There are arguments for both. Adding an extra filter would make the column actually without diacritics. On the other hand: keeping it as is would make testing the new system easier whether this issue is actually solved there.
Since you can argue either way, I would do nothing and close this story until you get a comment / issue from somebody who actually uses the entries in NameWoDiacritics.
Needs update in the official release process.
For some diacritics, NameWoDiacritics works just fine, like the letter ü
For other, more exotic diacritics like ă, this doesn't work.
This kinda makes sense, in the last Secretariat notes, you'll see see that these characters are substituted for
a
: à, á, â, ã, ä, å, æ. ă is notably absent in this list.I've ran a script to test them all, and these are the special characters still present in NameWoDiacritics: ň, č, ť, ř, ě, ň, č, ť, Č, ć, ů, ő, ē, ā, ī, Ġ, Ġ, ł, ę, ţ, ľ, ď.
Disclaimer: my script didn't actually extract the specific special characters, so I might have missed a few. Let me know if you are actually interested in this. And to be honest, I don't really mind considering I can read the file just fine with this project, but maybe you'd want to know.