andrewrk / node-diacritics

remove diacritics from strings ("ascii folding") - Node.js module
MIT License
263 stars 32 forks source link

List of diacritrics #18

Open matteocontrini opened 8 years ago

matteocontrini commented 8 years ago

Do you have a list of diacritrics that get converted?

In Italy, the fiscal code ("codice fiscale") has recently changed in a way that all the diacritrics are converted to ASCII characters. This table has been provided for the conversion.

How can I know if those characters are actually supported by your module, given that there's just a list of Unicodes in the source code?

Thanks

andrewrk commented 8 years ago

Do you need programmatic access to the list of diacritics or just want to evaluate the module?

matteocontrini commented 8 years ago

I'll try creating a test. I wanted to know if the module correctly handles all those cases, and it's not easy to know since there's not a list of supported diacritics. But that's fine, I'll try parsing that PDF. I'll let you know

matteocontrini commented 8 years ago

Ok, first of all, congratulations, because the module found was able to convert almost every character of that table. But there are some that differ:

Ä gets converted to  A, document says AE
ä gets converted to  A, document says AE
Å gets converted to  A, document says AA
å gets converted to  A, document says AA
Ð gets converted to  DH, document says D
IJ gets converted to  IJ, document says IJ <-- 
ij gets converted to  IJ, document says IJ <-- these 2 are not converted
Ö gets converted to  O, document says OE
ö gets converted to  O, document says OE
Ø gets converted to  O, document says OE
ø gets converted to  O, document says OE
Ü gets converted to  U, document says UE
ü gets converted to  U, document says UE

Note that I uppercased the results becaues that's what the table gives me. The code.

I don't know which variant is the right one in the test results. I can tell you that the PDF table linked above is almost the same from here, which talks about some ISO standards.

homersimpsons commented 8 years ago

I think your diaritics 'translation' are about the Italian sound, while this implementation deals with visual