Open 9am opened 1 year ago
Born and grew up in China, I took granted for knowing those symbols "naturally"(maybe not that nature, I just forgot the pain of learning it as a kid). My foreign friends who want to learn Chinese told me that it's hard to remember and understand the character. I mean, yeah, look at the stuff: 繁 [1]. Is it supposed to mean something? Oh, wait... there are 30000 [2] of them! Is that what takes to know how to read in Chinese? I quit.
Well, the symbol number in a language represents the difficulty of understanding it. Just take a glance at how hard to decode ancient Egyptian scripts, which has around 1000 separate symbols. In Chinese, the number is 200 [3], and for languages using the Latin alphabet like English, that is less than 50. That means, to read Chinese, my foreign friends need to recognize shapes of symbol 4× more than they get used to. It's pretty hard.
- [1] Well, I pick this one on purpose, the character itself means "complicated", and there are simple ones like 火, mind to take a guess on what it means based on the shape?
- [2] According to research on 2007, knowing 3500 chinese characters offers 99.48% coverage of common usages.
- [3] Most of the Chinese characters are composed of small symbols, the number of which is around 200.
But, after being "empty"[Tao Te Ching] like my ancestor would, it starts to make sense now. What if I don't know anything about Chinese, how would my mind process "漢字"?
Let's put it aside to see something we know. Consider this: 1 + 2
. As you look at it, 3
will pop out in your mind. How this works is that we've been programed a function or an operator called add
, and after parsing 2 + 3
with our eyes, we'll put 1
and 2
into that function to get the answer: add(1, 2) = 3
.
Try to think it the other way around. Suppose we don't know the meaning of 3
as a symbol, but we know 1
2
+
and =
, Does 3 = 1 + 2
helps us to understand 3
better? I think so.
It's pretty much the same with other unfamiliar symbols, like a Chinese character. All we have to do is to find the right equation [4], and turn everything on the right side of it to something we already learned.
- [4]
繁 = f(x, y, z...)
So, what are the f()
and x, y, z
for a Chinese character? Like all hieroglyphs, most of the basic elements in Chinese are symbolic. They came from ancient symbols that tried to imitate the shape of things in our daily lives. After thousands of years, they've evolved into characters. And it's not hard to understand them, like the table below, one can basically guess the meaning base on the shape. And those are the x, y, z
we are looking for. What about the f()
?
ancient symbol morden character english 一 one 二 two 三 three 木 wood 水 water 火 fire 土 earth 雨 rain 田 farm 人 person
Layout and position seem to be our f()
. Consider we have the function names: aboveToBellow
. 三
can be describe as 三 = aboveToBellow(一, 二)
. The layout methods like aboveToBellow
are called Ideographic Description Sequence(IDS), which are already in Unicode to describe the layout of CJK Characters. Let's try to use them as f()
, and it definitely tells us more about the unknowns. Check this out:
林 = ⿰(木, 木)
⿰ = leftToRight
木 = wood
林 == forest
泉 = ⿱(白, 水)
⿱ = aboveToBellow
白 = white
水 = water
泉 == spring (water)
燙 = ⿱(湯, 火)
⿱ = aboveToBellow
湯 = soup
火 = fire
燙 == boiling hot
雷 = ⿱(雨, 田)
⿱ = aboveToBellow
雨 = rain
田 = farm
雷 == thunder
囚 = ⿴(口, 人)
⿴ = surround
口 = walls (as shape)
人 = person
囚 == imprison
left to right | ⿰ |
above to below | ⿱ |
left to middle and right | ⿲ |
above to middle and below | ⿳ |
full surround | ⿴ |
surround from above | ⿵ |
surround from below | ⿶ |
surround from left | ⿷ |
surround from upper left | ⿸ |
surround from upper right | ⿹ |
surround from lower left | ⿺ |
overlaid | ⿻ |
Let's try more to see if you can guess the answer, start with a simple one:
森 = ⿱(木, 林)
木 = wood
林 = forest
森 == full of trees
坐 = ⿻(从, 土)
从 = group of
土 = earth,land
坐 == sit
And you may notice that some of the components are familiar:
So they can be deconstruct again, we'll get this:
森 = ⿱(木, ⿰(木, 木))
坐 = ⿻(⿰(人, 人), 土)
Well, see what we have here, if format it like this, 坐sit will be a tree structure.
⿻
/ \
⿰ 土
/ \
人 人
And from the perspective of a Front-end eng, that's also a React component tell us how to render the character:
const 坐 = () => (
<⿻
<⿰ 人 人 />
土
/>
)
And if we read it from left to right. It's a Polish Notation of how to calculate the character:
const 坐 = '⿻⿰人人土'
And if we translate the ideographic description to a human-readable message:
A group of⿰ people人 with their bottom on⿻ the ground土, that is 'Sit'坐.
That's what we're looking for! If every character can be turned into a Polish Notation, based on which we can understand the meaning of it in the meantime. So how to get the Polish Notation automatically?
As a Chinese, I can do it by recognizing atoms and composing them with ideographics. And dive in recursively until it can not be deconstructed anymore. But it will take a huge effort. Wait a second, that might be something LLM can do for us, but I doubt there ain't enough context for this stuff on the Web. Well, after giving it a try, GPT-4 nailed simple ones but failed on some complicated ones. I believe with more training inputs and a better prompt there are possibilities for sure.
If all of the atoms are replaced with placeholders _
, what's left can be thought as the backbone of this character, like ⿻⿰___
or ⿱_⿰⿵__⿵__
, which can be the render function name of this structure. And I would say that the total number of those will be a small one, maybe somewhere under 30.
function ⿰__(a, b) {}
function ⿱__(a, b) {}
function ⿱_⿰__(a, b, c) {}
function ⿻⿰___(a, b, c) {}
function ⿱_⿰⿵__⿵__(a, b, c, d, e) {}
Based on those functions, we can do something more interesting. Take '雷thunder' for a example: the Polish Notation is '⿱雨rain田farm', the structure function will be function ⿱__(a, b) {}
. After currying the function, it can be used like this: ⿱__(a)(b)
, so 雷 = ⿱__(雨)(田)
. Now if we fill the function with a curry placeholder, will get us another function: ⿱_田 = ⿱__(*)(田)
, so '雷' can also be 雷 = ⿱_田(雨)
. ⿱_田
describes a structure that the bottom part is already filled with 田, and will take something above it to compose a character.
Well, I believe it's true because we could read from this: The qucik borwn fox jmups oevr the lzay dog
. Maybe in our head, instead of remembering the word quick
, we learn its curried function: qu__k()()
. So when we read fast, we basically read with those curried functions, even without filling in any parameters.
For English, it's a 1-dimensional thing, there is only 1 ideographic description ⿰. A word is a train that uses letters as carriages. For Chinese, however, it takes 2 dimensions to describe. That also explains the complicity.
I wonder if there is a 3D character system on our planet. It would take huge efforts to learn, and also will definitely apply a massive effect on the brains who understand them. Those brains are trained to do 3D currying all the time, I really want to know how they think differently.
Actually, we already did some experiments to level up the dimension. The crossword puzzle game is an attempt to make a 2D system by expanding a line to a plane. So the crossword puzzle for Chinese could only be in 3D. And if there is a 3D character system, the game would have to be in 4D.
It reminds me of a Sci-fi 'Stroy Of Your Life' by Ted Chiang. The aliens who use a 4D character system see 'time' in a totally different way. I wonder if the author gets inspiration from the Chinese characters.
Well, hope you enjoy it. I'll see you next time.
@9am 🕘