chinese text ? - Githubissues

dongjun-Lee / text-classification-models-tf

Tensorflow implementations of Text Classification Models.

505 stars 164 forks source link

Closed z744364418p closed 6 years ago

dongjun-Lee commented 6 years ago

For different language, you should change clean_str(), word_tokenize(), and alphabet

z744364418p commented 6 years ago

tokenize() have been changed "split()",but no work,can you help me? think you

z744364418p commented 6 years ago

how to change alphabet for chinese?

dongjun-Lee commented 6 years ago

You should remove or change this line in clean_str() because it will remove all chinese characters. You need to define alphabet to use character-level models(char_cnn, vd_cnn). I don't know much about chinese so I'm not sure how to define alphabets in chinese. koalaGreener/Character-level-Convolutional-Network-for-Text-Classification-Applied-to-Chinese-Corpus might help you.