train fasttext for language Identification, use word segmentation or subwords for Chinese?

facebookresearch / fastText

Library for fast text representation and classification.

https://fasttext.cc/

MIT License

25.97k stars 4.72k forks source link

train fasttext for language Identification, use word segmentation or subwords for Chinese? #908

Open jiahuigeng opened 5 years ago

jiahuigeng commented 5 years ago

Language identification with fasttext is great, https://fasttext.cc/blog/2017/10/02/blog-post.html But the training process is not clear, I am wondering if for language identification, subword is used? or do we need to do word segmentation for some languages like Chinese, Japanese?

Vickzhang commented 5 years ago

Segmentation for some languages like Chinese is necessary，for my experience.