Open jckt opened 6 years ago
Which dictionary are you using for cantonese? Is it CC-Canto from http://cantonese.org/?
edit: Thanks for the contribution by the way :)
The raw data comes from CC-Canto and the CC-CEDICT Cantonese readings (both from cantonese.org), these were processed into a single file.
You're welcome!
This is great! Tested it and it resolves an issue with words like 捨棄 failing to be looked up that has been constantly annoying me, is there anything blocking this from being merged?
I noticed that with this branch jyutping seems to be unavailable for 律 and all words containing it ie 法律,律师,旋律,音律,因果律,定律,菲律宾 - I'm not sure why
Found the reason for the above error, it looks like the scripts that generates cedict_combined.u8 might have some bugs as it doesn't seem to include jyutping everywhere. See the below (jyutping should be between the { } )
法律 法律 [fa3 lu:4] { } /law/CL:條|条[tiao2], 套[tao4], 個|个[ge4]/
Oh this seems to impact every word containing a character that has pinyin pronunciation v (u:), like 女,绿,吕,驴. Presumably an issue with the script that generates cedict_combined.u8 (which unfortunately doesn't seem to be included in the repository)
I wrote a big message just now about how in general I've tried to avoid autocompleting jyutpings on a per-character basis (leads to many errors, even the Pleco dictionary on iPhone has it, which uses a better version of the CC-Canto sources AFAIK). But you're right, actually in this case it's my fault and that there is a bug in the generator scripts. In fact, the entry is double-entered; somewhere else in the file:
法律 法律 [fa3 lv4] {faat3 leot6}
So there's now two ways of expressing ü in the dictionary (I forget if this is a problem, I'll check again soon when I have the time). In this case I guess one could either condense the two entries (easy in this case since the entry above is deformed -- it as no / /
field for a (blank) definition, so the regex just misses it completely (that's why it doesn't even show up as a definition-free entry). Or one can just leave the two entries but auto-clean the pinyins and / /
definition field. I'll try to fix it as soon as I have the time.
For now, I've attached the dictionary generator scripts. I didn't include them in the branch since I thought I would quickly clean them up and include some autocomplete system that also gave correct results (but that's actually a much harder problem than I thought it was).
Thanks again for pointing this out. generators.zip
@Paperfeed Any chance this can get merged and deployed to the Chrome Web Store? I'm interested in being able to use Cantonese and can help push this along if more changes are needed
Added Cantonese pronunciation support
Main features added:
Fixed bugs, notably: