cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
14 stars 3 forks source link

unicode confusables and normalization #23

Open LinguList opened 4 years ago

LinguList commented 4 years ago

We have more or less clarified this in code already:

But we also started to collect things in cldf/multicode. Many of the examples there belong to what we would use to normalize a dataset. But not all.

I think we can drop multicode, as it was never really followed up, and we'd have to think how to integrate it into any of our tools (maybe one could use it for normalization in linse, where we also have a small normalization procedure for bipa only, to be able to use linse without depending on pyclts). But we should thoroughly check to have harvested all major characters from the unicode confusables list.