chinese-words-separator / chinese-words-separator.github.io

5 stars 1 forks source link

Pronunciations and tones remapping for annotations and dictionary #12

Open chinese-words-separator opened 1 year ago

chinese-words-separator commented 1 year ago

Will be used by learners of Taiwan pronunciation and tones

That is, when Read-aloud is set to Taiwan..

image

.., this for example.. 下頦 下颏 [xia4 ke1] /chin/Taiwan pr. [xia4 hai2]/ ..will be internally swapped and changed to: 下頦 下颏 [xia4 hai2] /chin/Mainland China pr. [xia4 ke1]/

chinese-words-separator commented 1 year ago

The main pinyin and the alternative pinyin (Taiwan pr.) of the current data structure is not yet swap-ready, due to some words that are multi-words, e.g.,

安營紮寨    安营扎寨    an1 ying2_zha1 zhai4    to set up camp。 Taiwan pr. [an1 ying2 zha2 zhai4]   0   2,2

If will pursue this functionality, the data structure have to be changed accordingly. Need to make the Taiwan pr. be like this:

[an1 ying2_zha2 zhai4]

Also currently, the dictionary when saved to database, the database receives pre-computed pinyin and zhuyin, so that displaying them after retrieving them, requires less work. We cannot rely on this pre-computed values (p and z properties) anymore if the functionality to swap pronunciation/tones will be introduced to CWS, we must compute the user-facing values (p and z) on-the-fly, we must instead rely on (and save) raw source (property r) only. Don't use pre-computed values

{
    d: [2, 2],
    e: "to set up camp。 Taiwan pr. [an1 ying2 zha2 zhai4]",
    l: 0,
    p: "ānyíng zhāzhài",
    r: "an1 ying2_zha1 zhai4",
    s: "安营扎寨",
    t: "安營紮寨",
    z: "ㄢㄧㄥˊ ㄓㄚㄓㄞˋ"
}

Another challenge is to show either (Mainland China or Taiwan) of the pronunciation even if the full word (e.g., 垃圾) is not in the actively hovered character/word

image

On that, when Taiwan pronunciation is the selected mode, the following should be shown:

垃 lè Mainland China pr. [la1]。 used in 垃圾[le4 se4]

垃圾's Taiwan pronunciation (le4 se4) need to be looked up separately as the dictionary pop-up list has no phonetic entry for 垃圾 it can be swapped to with. Not impossible to make, but fraught with many challenges.

The following is a little bit easier to implement than above, since the counterpart phonetics are already in the list shown by the dictionary

image

The following will be shown when swapping the word with Taiwan pronunciation:

垃圾 lèsè trash。 refuse。 garbage。 (coll.) of poor quality。 Mainland China pr. [la1 ji1]
chinese-words-separator commented 9 months ago

Another challenge, the Taiwan pronunciation of the multi-character word is not directly in the definition. Need to look up each character of the word one by one to check if there is a corresponding Taiwan pronunciation for a given character

https://cc-cedict.org/editor/editor.php?log_id=80306&return=ListChanges&handler=ViewLogEntry

https://cc-cedict.org/editor/editor.php?log_id=81344&return=ListChanges&handler=ViewLogEntry