Chinese character is counted as 1 letter

benjypng / logseq-tweet-plugin

MIT License

13 stars 1 forks source link

Chinese character is counted as 1 letter #9

Closed xxchan closed 2 years ago

xxchan commented 2 years ago

It should be 2

benjypng commented 2 years ago

Chinese characters count correctly for me though. In your screenshot, it indicates 2 as well?

xxchan commented 2 years ago

Sorry if that was unclear. I mean 中文 should be 4 letters. One Chinese character takes 2 letters.

xxchan commented 2 years ago

I did some experiments on Twitter and I guess UTF8 1&2bytes (e.g., ascii, ߷) characters are counted as 1 letter, and 3&4bytes (e.g., 啊, 𒀐) characters are counted as 2 letters...

If so, const blockLen = [...blockContent].map(c=>Math.floor((Buffer.byteLength(c, 'utf8')+1)/2)).reduce((a, b) => a + b, 0) may work

benjypng commented 2 years ago

I see! It looks like their API documentation does cover this.

Will find some time to fix this in the coming week.

benjypng commented 2 years ago

Fixed in v1.9.0!