crazydonkey200 / tensorflow-char-rnn

Char-RNN implemented using TensorFlow.
MIT License
425 stars 267 forks source link

How to modify it to process Chinese text? #7

Closed coomt closed 7 years ago

coomt commented 7 years ago

It seems that this program is designed for processing English text, but I have some Chinese text to train. How can I modify it?

crazydonkey200 commented 7 years ago

Hi, this program is actually designed to process any text (which is one advantage of Char RNN). I have used it on some Chinese text before and the result is pretty fun :)

You just need to specify the encoding of the text using the --encoding argument. This is also noted in the Readme.

Note: train.py assume the data file is using utf-8 encoding by default, use --encoding=your-encoding to specify the encoding if your data file cannot be decoded using utf-8.

Chinese text is usually using utf-8 or gb2312.