chinese-words-separator / chinese-words-separator.github.io

5 stars 1 forks source link

Make Chinese characters 字, not words 詞 the unit #13

Closed supercontingency closed 1 year ago

supercontingency commented 1 year ago

Thank you again for your quick response to post #10 . I think the root of the issue might be that the basic unit of CWS is Chinese words, so the zhuyin for 咖啡壺 is centered for the entire word instead of each character. This also brings another issue when trying to hide zhuyin for learned words, because we need to add the words 他, 他們, 他的, 他們的, 其他, etc all separately into the learned word list to hide the zhuyin for 他. I think it would be very helpful to make Chinese characters, instead of words the unit of CWS

chinese-words-separator commented 1 year ago

TL;DR Multi-character words is the name of the game now when learning Chinese. It's only in classical Chinese and old scripts that exclusively learning by characters is survivable. See the last link of the videos below

This comment from this video:

image

Wait for the big "BUT.." here.. https://youtu.be/avAp1A1FtZ4?t=490

By the way, HSK does not include the words 他的, 他們的 in its list, and neither do CC-CEDICT (the dictionary that CWS is using), see CWS's dictionary screenshot below. Reading 他 and 的 together is intuitive enough to have its meaning inferred as possessive his/her/their. It would be superfluous if they have added it to HSK list in order to just give it a one-to-one mapping with English language his/her/their. I don't know if that is the case with TOCFL too, I'm still exploring resources for Taiwan-centric language learning. I do hope that TOCFL does not have an entry for 他的, 他們的, it's unnecessary to say the least. On my cursory Googling those are not in TOCFL list

In theory, even though we want to learn individual characters; in practice, we are learning things in clusters. Hence the things that need to be learned that are based on complexity of rankings (e.g., HSK, TOCFL), are based on words and not based on single characters alone. Words are made of multiple characters. HSK have separate entries for learning 他, 他們, 其他 (see the HSK numbers on left side of the words in the screenshot below). 其他 is on HSK 2, yet is on higher level, HSK 5. I'm still a beginner/lower-intermediate on learning Chinese, I know the meaning of 其他 but I can't readily come up with usages of 其. 其他 is an easier to grasp concept than . The higher the HSK/TOCFL level is, the fewer the single character words you'll see on those higher level rankings, that says a lot on what learners have to prioritize when learning Chinese language. The last video in the links below articulates this well

Sometimes we have to learn longer words first (e.g., 其他, 怎么办, 一模一样) before we learn shorter words or one character word, as longer words have exacting meaning. Sometimes we have to learn multi-characters word first, before we even learn its individual characters; and alas, even a single character by itself is a cluster of multiple characters. We don't always want to get bogged down with such sub-atomic knowledge of things. Knowing 他 does not make us automatically know the meaning of 其他, if we are to learn and comprehend it by each character, we might wrongly interpret it as "such - person", or "that - person". Top-down, drill-down approach to learning, tops any form of language learning

image

How Chinese words are made is very intuitive, but sometimes it is not. In the example above, the word 其他, its component character 其 is not readily apparent how it make the word other when it is combined with 他. Learners have to learn 其他 by its own merit. It goes the other way around too, knowing 其他 does not necessarily make one automatically know how to use 其 in context. So aside from learning 其他, people have to learn 其 by its own merit too. Let's say if people has to learn things by isolation, that is by individual characters, and not by cluster of characters, I would hazard a guess that the person may come up with other ways to say other, it maybe multi-characters(or single character) that is non-sensical; or even if it is sensical, it might be not suited in a given context

Additionally, knowing Chinese language by characters instead of by words, might make one risk using one character word instead of a more apt word (which is often multi-characters, multi-character word makes the word more concrete), for example is another word for others. For a given context, some might wrongly use for others (instead of 其他) if the learner's language focus is by character instead of by words. So we really have to bubble up the word 其他 in language learning tools, we can't just leave the learners in the dark and just bubble the word up in their sub-atomic parts and . It will be distracting if we always see the sub-atomic parts and instead of seeing it in their one-concept word qítā which maps well to one-concept English word other. I know someone who have learned seven languages (Chinese is one of the seven languages he learned), he don't know how to read Chinese characters, he learned Chinese via pinyin, he sound native to native Chinese speakers. I think he won't be able to acquire that many languages very fast if each syllable of a multi-character word always bubble up to him when he learns Chinese via pinyin. For some, pinyin/zhuyin alone is a workable learning tool, they don't use characters: https://troubadourworks.com/pinyintypist/why_chinese_is_so_hard-py.html#:~:text=qítā

Character are the building blocks of Chinese, but we will not be able to quickly learn everyday words if we don't learn multi-character words early on. The word so-so comes to mind. The word so is mapped to many Chinese words, a learner might make the risk of saying so-so as 然然, 偌偌, 所以所以, if he/she don't learn multi-character words 马马虎虎 early on

If we are to make Chinese characters the basic unit of CWS (and not multi-character words), does marking 马 or 虎 automatically mean we know 马马虎虎?

You'll notice in Chinese dictionaries (any dictionaries for that matter) that individual characters have many definitions. Individual characters have several usages, it's overloaded with many definitions due to non-exacting nature of individual characters. Corollary to that, the longer the Chinese word is, the more concrete and exacting its meaning is

Language learning tools should keep multi-character words as its basic unit

Some learners are not even a fan of flashcards, even multi-character words flashcards denies the learners the knowledge of the context of where the word can be used or can not be used. If even learning multi-character words denies learners of such knowledge, what more if learner's basic unit of language acquisition is just by characters and not by words. Hence many are proponents of sentence mining, that is, learning language by sentence instead of learning words in isolation

The video I posted earlier, he has another video where he articulate well why 1 syllable word is not as common as one might want to believe. Why Chinese HATES 1 syllable words. Continuation: Why is Chinese OBSESSED w/ 2 Syllable Words

To sum it up, we will not make individual characters be the basic unit in our learning tools just to justify that it will make annotations exactly centered. Though unfortunate that we can't center the annotations visually (mathematically they are), we can't compromise the ease of language acquisition that is clustered-based. We will not let language learners be distracted with seeing the sub-atomic parts as the default medium to learn. Seeing the sub-atomic parts should be an opt-in, hence CWS's dictionary allow learners to see multi-characters word to be separated only when they right-clicked the multi-characters word