euslisp / EusLisp

EusLisp is an integrated programming system for the research on intelligent robots based on Common Lisp and Object-Oriented programming. [Manual](http://euslisp.github.io/EusLisp/manual.html ) [マニュアル](http://euslisp.github.io/EusLisp/jmanual.html )
Other
56 stars 50 forks source link

Japanese text libraries are not working #348

Open weiqiyang opened 5 years ago

weiqiyang commented 5 years ago

I was trying to convert Japanese Kana to Roman-ji. setq and print work well with Kana, but the romanji function in both kana_sjis.l and kana_euc.l was not reading the input Kana properly.

Following is a sample output using kana_sjis.l.

1.irteusgl$ (load "lib/llib/kana_sjis.l") t 2.irteusgl$ (romanji "わたしは123まついです。abcひゅうるいちぇんぐふぁつぉでゅ") "123abc"

I suppose the code itself is right. Then it might be a mismatch of my terminal's character coding. Are there any extra settings I have to do before using these libraries?

YoheiKakiuchi commented 5 years ago

If you use emacs, use M-x set-buffer-process-coding-system for changing encoding type for input/output process. For ubuntu terminal, you can use menu of 端末(T) -> 文字コードの設定(C).

But, problem for this issue, the file lib/llib/kana_sjis.l is saved using utf-8 encoding. So, you should change the encoding of the file.

FYI, the encoding of kana_euc.l should be changed.

YoheiKakiuchi commented 5 years ago

You can check the actual digit of string like below.

SJIS

(setq a "ほげ")
(map cons #'(lambda (c) (format nil "0x~X" c)) a)
 => ("0x82" "0xd9" "0x82" "0xb0")

UTF-8

(setq a "ほげ")
(map cons #'(lambda (c) (format nil "0x~X" c)) a)
 => ("0xe3" "0x81" "0xbb" "0xe3" "0x81" "0x92")
weiqiyang commented 5 years ago

Thanks! Changing source file encoding solved my problem.

I change the encoding after made a copy of the original file:

iconv -f utf-8 -t euc-jp kana_euc.l.bak -o kana_euc.l

The terminal character encoding also need to be changed correspondingly. And the result is as below.

1.irteusgl$ load "lib/llib/kana_euc.l"
t
2.irteusgl$ (romanji "わたしは123まついです。abcひゅうるいちぇんぐふぁつぉでゅ")
"watashiha123matsuidesu.abchyuuruichenngufatsodyu"

But since most of the time, we are using UTF-8, and so is the source code on github, maybe it is time for us to have something like a kana_utf.l?

k-okada commented 5 years ago

But since most of the time, we are using UTF-8, and so is the source code on github, maybe it is time for us to have something like a kana_utf.l?

That’s good idea. Can you create PR for this?

--

◉ Kei Okada

weiqiyang commented 5 years ago

That’s good idea. Can you create PR for this?

Yes, I will.