cjkvi / cjkvi-ids

IDS data for CJK Unified Ideographs
http://kanji-database.sourceforge.net/
403 stars 83 forks source link

Cross-check with Chaizi #79

Open DonaldTsang opened 5 years ago

DonaldTsang commented 5 years ago

Is it possible to do a comparison with https://github.com/kfcd/chaizi ? Or add a note in the ReadME?

hfhchan commented 5 years ago

This repository is used by IRG (Ideographic Rapporteur Group) to reduce possibility of encoding existing variants. The main target of this dataset is for fuzzy matching. The dataset is covering all encoded CJK Ideographs, which means URO - Extension F (80,000+ characters).

The aim and coverage is different from that of chaizi, and the principles and targets for decomposing characters are different. Cross-check will probably not yield substantial benefit to the processes of IRG.

DonaldTsang commented 5 years ago

@hfhchan in that case maybe have a footnote about other "chinese decomposition libraries" and how they are different from CJKVI?