gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

Add option to preserve diacritics #210

Closed tobymarsden closed 2 years ago

tobymarsden commented 2 years ago

This also for discussion, and to flesh out what an option to preserve diacritics could look like. (API interface etc not implemented yet, though if this idea has legs I'd be very happy to take that on.)

tobymarsden commented 2 years ago

@dimus here we go; this is updated to preserve diaereses (but not other diacritics) with the -D option. It applies to details, normalized and canonical with the exception of stemmed, which has the diaereses removed (but transliterated directly, i.e. ö -> o, which produces correctly spelled names in my test corpus). Other diacritics (including e.g. ö not preceded by a vowel) are transliterated as they are currently (oe in this example).

I've applied this to the web interface too. The ronn command needs re-running I'm afraid because ronn's dependencies seem to be broken at the moment and I couldn't easily install it.

As always let me know what modifications you need and I'll hop on it. Thanks!

dimus commented 2 years ago

Looks good to me @tobymarsden ! Merging and rebuilding man pages