BYVoid / OpenCC

Conversion between Traditional and Simplified Chinese
https://opencc.byvoid.com/
Apache License 2.0
8.47k stars 981 forks source link

`s2t` converts “背包” to “揹包,” but `t2s` or `tw2s` doesn’t do the opposite #752

Open NaitLee opened 1 year ago

NaitLee commented 1 year ago

The definition of “揹”-included phrases (of s2t) are around here.

But there isn’t 揹 背 in TSCharacters.txt.

To reproduce:

$ echo "背包" | opencc -c s2t.json | opencc -c t2s.json
揹包

AFAIK there isn’t any usage of “揹” in Simplified Chinese. So please add “背” as a simplification :)

Notes: It seems that “揹” is just a variation of “背”, in/for both Traditional and Simplified Chinese. Both “揹” and “背” are seen in Web search results (of sites that use Traditional Chinese). So that both are correct, anyway.

ayaka14732 commented 1 year ago

「揹」就是異體字,建議刪除

NaitLee commented 1 year ago

根據 OpenCC 「能分則不合」的原則,像「揹」這樣算是細分用法的字其實合乎邏輯。 據此處,「揹」算作傳統字。 但一些字典(如這裏)說爲異體字。據說《康熙字典》《說文解字》均未收錄此字。 「背」下部從「肉」,可指肩膀與後背,動詞上已經有「負荷」的含義。根據這邏輯可能「揹」要算異體。 從相關互聯網搜索(多爲港、臺網店商品)來看,「背包」和「揹包」都有使用。 具體作何決策還待專家考察 😄

不管怎樣,需要爲 t2s 添加此組合:簡體不使用「揹」,若出現則需要替換掉。

danny0838 commented 1 year ago

《通用規範漢字表》中有列出「背」是規範字,「揹」是異體字。OpenCC 所謂的簡體字就是中國規範字,按此原則上應將《通用規範漢字表》中的異體字轉為規範字。

除了此字以外,還有一大堆可以按相同原則轉為規範字的異體字。我很久以前就在 #492 提過 PR,但當時老大說要再研議,不曉得目前考慮得如何了……。