Open DonaldTsang opened 5 years ago
The data used in this dataset is primarily shape-based, as shape-based fuzzy matching is used for IRG (Ideographic Rapporteur Group) uses. Unfortunately, phono-semantic information is not required in the mechanical process of identifying possible duplicates.
@hfhchan it might be useful for cjkvi research if phono-semantics and other relations are drawn out more clearly, because sometimes I want to search what phonetic derivatives a character has, and not match characters with random structures.
@DonaldTsang Could you explain the structure of that file a bit? Which field provides the phonetic element?
The 10th element or the "聲符" is the phonetic element. The other elements are mostly phonological comparisons of Cantonese and Mandarin. The semantic elements are not on the table, however, they can be inferred by comparing the phonetic elements and other elements of the character in question.
Also in regards to character alternate forms https://github.com/BYVoid/ytenx/blob/master/ytenx/sync/jihthex/JihThex.csv https://github.com/BYVoid/ytenx/blob/master/ytenx/sync/jihthex/ThaJihThex.csv (same characters of different forms on the same row)
The first line of the file is for the character 愛, and the phonetic character for it is... 隊? 🤔 I don't see this character inside of 愛. Is it in reduced form somehow? What I was hoping for was a DB that would always tell me the phonetic component of a character, in a way that is recognizable for learning the pronunciation.
Some of the items from the 6th column are blank, so 曖 and 僾 both map to 愛. 扒 maps to 八.
Thanks a bunch! This was super useful.
@garfieldnate It's all Lasagna.
I haven't heard the term "lasagna" before. What does this mean?
Well, Garfield, you like it don't you? Consider that you sleep in a Lasagna tin.
Regarding reconstruction of characters not on the list, there are ways to go about it.
Hunan or Xiang can be cross-checked through Nushu:
Cantonese can be extracted through https://github.com/jyutnet/cantonese-books-data and https://github.com/wordshk/yue_references Other Chinese dialects are pooled through https://github.com/laubonghaudoi/Chinese_Rime and https://github.com/edenau/phonological-mapping
Conversions of text forms are necessary.
@garfieldnate thanks for doing this, hope this gets passed through in the repo
It would be good to use this as a way of addressing the phonetic elements of a character. https://github.com/BYVoid/ytenx/blob/master/ytenx/sync/dciangx/DrienghTriang.txt