We have more or less clarified this in code already:
normalize is a one to many conversion procedure, only single characters are allowed, it is transcriptionsystem specific, as it is possible that different systems normalize in different ways
confusables going beyond this are excluded and placed into the alias section
But we also started to collect things in cldf/multicode. Many of the examples there belong to what we would use to normalize a dataset. But not all.
I think we can drop multicode, as it was never really followed up, and we'd have to think how to integrate it into any of our tools (maybe one could use it for normalization in linse, where we also have a small normalization procedure for bipa only, to be able to use linse without depending on pyclts). But we should thoroughly check to have harvested all major characters from the unicode confusables list.
We have more or less clarified this in code already:
But we also started to collect things in cldf/multicode. Many of the examples there belong to what we would use to normalize a dataset. But not all.
I think we can drop multicode, as it was never really followed up, and we'd have to think how to integrate it into any of our tools (maybe one could use it for normalization in linse, where we also have a small normalization procedure for bipa only, to be able to use linse without depending on pyclts). But we should thoroughly check to have harvested all major characters from the unicode confusables list.