gbv / cocoda-db

Colibri Concordance Database: Database Backend (DEPRECATED)
https://coli-conc.gbv.de/cocoda/api
GNU Affero General Public License v3.0
1 stars 0 forks source link

Ensure Unicode normalization #11

Open nichtich opened 8 years ago

nichtich commented 8 years ago

JSKOS requires NFC but this is not checked by now. MongoDB does not normalize. For accent folding it may be helpful to use NFD internally (e.g. for accent folding) and to normalize back to NFC on export.

db.concepts.find({notation:/o/}) # finds o and NFD ö 
db.concepts.find({notation:/o/i}) # finds o, O, and NFD ö and Ö

Unfortunately you cannot search for o only when using NFD. Solutions