If Chinese Characters(漢字) Are Curried Functions...

Why Chinese characters are hard to learn?

Born and grew up in China, I took granted for knowing those symbols "naturally"(maybe not that nature, I just forgot the pain of learning it as a kid). My foreign friends who want to learn Chinese told me that it's hard to remember and understand the character. I mean, yeah, look at the stuff: 繁 ^[1]. Is it supposed to mean something? Oh, wait... there are 30000 ^[2] of them! Is that what takes to know how to read in Chinese? I quit.

Well, the symbol number in a language represents the difficulty of understanding it. Just take a glance at how hard to decode ancient Egyptian scripts, which has around 1000 separate symbols. In Chinese, the number is 200 ^[3], and for languages using the Latin alphabet like English, that is less than 50. That means, to read Chinese, my foreign friends need to recognize shapes of symbol 4× more than they get used to. It's pretty hard.

[1] Well, I pick this one on purpose, the character itself means "complicated", and there are simple ones like 火, mind to take a guess on what it means based on the shape?

[2] According to research on 2007, knowing 3500 chinese characters offers 99.48% coverage of common usages.

[3] Most of the Chinese characters are composed of small symbols, the number of which is around 200.

But, after being "empty"^{[Tao Te Ching]} like my ancestor would, it starts to make sense now. What if I don't know anything about Chinese, how would my mind process "漢字"?

1. Tao Te Ching talk about why it's important to stay empty.

「道德经」第十一章
凿户牖以为室，当其无，有室之用
埏埴以为器，当其无，有器之用
凿户牖以为室，当其无，有室之用
故有之以为利，无之以为用
「Tao Te Ching」Chapter 11
The thirty spokes unite in the one nave; but it is on the empty space (for the axle), that the use of the wheel depends.
Clay is fashioned into vessels; but it is on their empty hollowness, that their use depends.
The door and windows are cut out (from the walls) to form an apartment; but it is on the empty space (within), that its use depends.
Therefore, what has a (positive) existence serves for profitable adaptation, and what has not that for (actual) usefulness.

Try to figure it out in a geek hat.

Let's put it aside to see something we know. Consider this: 1 + 2. As you look at it, 3 will pop out in your mind. How this works is that we've been programed a function or an operator called add, and after parsing 2 + 3 with our eyes, we'll put 1 and 2 into that function to get the answer: add(1, 2) = 3.

Try to think it the other way around. Suppose we don't know the meaning of 3 as a symbol, but we know 1 2 + and =, Does 3 = 1 + 2 helps us to understand 3 better? I think so.

It's pretty much the same with other unfamiliar symbols, like a Chinese character. All we have to do is to find the right equation ^[4], and turn everything on the right side of it to something we already learned.

[4] 繁 = f(x, y, z...)

If it's a function, what's the abstract?

1. The atoms that compose the character.

So, what are the f() and x, y, z for a Chinese character? Like all hieroglyphs, most of the basic elements in Chinese are symbolic. They came from ancient symbols that tried to imitate the shape of things in our daily lives. After thousands of years, they've evolved into characters. And it's not hard to understand them, like the table below, one can basically guess the meaning base on the shape. And those are the x, y, z we are looking for. What about the f()?

ancient symbol morden character english

一 one

二 two

三 three

木 wood

水 water

火 fire

土 earth

雨 rain

田 farm

人 person

ancient symbol	morden character	english
	一	one
	二	two
	三	three
	木	wood
	水	water
	火	fire
	土	earth
	雨	rain
	田	farm
	人	person

2. The ancient wisdom in the ideographic description.

Layout and position seem to be our f(). Consider we have the function names: aboveToBellow. 三 can be describe as 三 = aboveToBellow(一, 二). The layout methods like aboveToBellow are called Ideographic Description Sequence(IDS), which are already in Unicode to describe the layout of CJK Characters. Let's try to use them as f(), and it definitely tells us more about the unknowns. Check this out:

林 = ⿰(木, 木)

⿰ = leftToRight
木 = wood
林 == forest
泉 = ⿱(白, 水)

⿱ = aboveToBellow
白 = white
水 = water
泉 == spring (water)
燙 = ⿱(湯, 火)

⿱ = aboveToBellow
湯 = soup
火 = fire
燙 == boiling hot
雷 = ⿱(雨, 田)

⿱ = aboveToBellow
雨 = rain
田 = farm
雷 == thunder
囚 = ⿴(口, 人)

⿴ = surround
口 = walls (as shape)
人 = person
囚 == imprison

The ideographic description in unicode: ⿰ ⿱ ⿲ ⿳ ⿴ ⿵ ⿶ ⿷ ⿸ ⿹ ⿺ ⿻

left to right	⿰
above to below	⿱
left to middle and right	⿲
above to middle and below	⿳
full surround	⿴
surround from above	⿵
surround from below	⿶
surround from left	⿷
surround from upper left	⿸
surround from upper right	⿹
surround from lower left	⿺
overlaid	⿻

3. Take advantage of the hidden meaning of the operators.

Let's try more to see if you can guess the answer, start with a simple one:

森 = ⿱(木, 林)

木 = wood
林 = forest
森 == full of trees

坐 = ⿻(从, 土)

从 = group of
土 = earth,land
坐 == sit

And you may notice that some of the components are familiar:

林_forest is a bunch of 木_wood.
从_{group of} is 人_person stay next to each other.

So they can be deconstruct again, we'll get this:

森 = ⿱(木, ⿰(木, 木))
坐 = ⿻(⿰(人, 人), 土)

Well, see what we have here, if format it like this, 坐_sit will be a tree structure.

       ⿻
      / \
    ⿰   土
   / \
 人   人

And from the perspective of a Front-end eng, that's also a React component tell us how to render the character:

const 坐 = () => (
    <⿻
        <⿰ 人 人 />
        土
    />
)

And if we read it from left to right. It's a Polish Notation of how to calculate the character:

const 坐 = '⿻⿰人人土'

And if we translate the ideographic description to a human-readable message:

A group of_⿰ people_人 with their bottom on_⿻ the ground_土, that is 'Sit'_坐.

That's what we're looking for! If every character can be turned into a Polish Notation, based on which we can understand the meaning of it in the meantime. So how to get the Polish Notation automatically?

4. Convert a character to Polish Notation with GPT.

As a Chinese, I can do it by recognizing atoms and composing them with ideographics. And dive in recursively until it can not be deconstructed anymore. But it will take a huge effort. Wait a second, that might be something LLM can do for us, but I doubt there ain't enough context for this stuff on the Web. Well, after giving it a try, GPT-4 nailed simple ones but failed on some complicated ones. I believe with more training inputs and a better prompt there are possibilities for sure.

If all of the atoms are replaced with placeholders _, what's left can be thought as the backbone of this character, like ⿻⿰___ or ⿱_⿰⿵__⿵__, which can be the render function name of this structure. And I would say that the total number of those will be a small one, maybe somewhere under 30.

function ⿰__(a, b) {}
function ⿱__(a, b) {}
function ⿱_⿰__(a, b, c) {}
function ⿻⿰___(a, b, c) {}
function ⿱_⿰⿵__⿵__(a, b, c, d, e) {}

Currying the function

Based on those functions, we can do something more interesting. Take '雷_thunder' for a example: the Polish Notation is '⿱雨_rain田_farm', the structure function will be function ⿱__(a, b) {}. After currying the function, it can be used like this: ⿱__(a)(b), so 雷 = ⿱__(雨)(田). Now if we fill the function with a curry placeholder, will get us another function: ⿱_田 = ⿱__(*)(田), so '雷' can also be 雷 = ⿱_田(雨). ⿱_田 describes a structure that the bottom part is already filled with 田, and will take something above it to compose a character.

Does my brain currying while reading?

Well, I believe it's true because we could read from this: The qucik borwn fox jmups oevr the lzay dog. Maybe in our head, instead of remembering the word quick, we learn its curried function: qu__k()(). So when we read fast, we basically read with those curried functions, even without filling in any parameters.

The Dimension of Character System

For English, it's a 1-dimensional thing, there is only 1 ideographic description ⿰. A word is a train that uses letters as carriages. For Chinese, however, it takes 2 dimensions to describe. That also explains the complicity.

I wonder if there is a 3D character system on our planet. It would take huge efforts to learn, and also will definitely apply a massive effect on the brains who understand them. Those brains are trained to do 3D currying all the time, I really want to know how they think differently.

Actually, we already did some experiments to level up the dimension. The crossword puzzle game is an attempt to make a 2D system by expanding a line to a plane. So the crossword puzzle for Chinese could only be in 3D. And if there is a 3D character system, the game would have to be in 4D.

It reminds me of a Sci-fi 'Stroy Of Your Life' by Ted Chiang. The aliens who use a 4D character system see 'time' in a totally different way. I wonder if the author gets inspiration from the Chinese characters.

Well, hope you enjoy it. I'll see you next time.

@9am 🕘

Read more articles at 9am.github.io

Find other things I built on GitHub and NPM

Contact me via email

9am / 9am.github.io