Language identification with fasttext is great,
https://fasttext.cc/blog/2017/10/02/blog-post.html
But the training process is not clear, I am wondering if for language identification, subword is used? or do we need to do word segmentation for some languages like Chinese, Japanese?
Language identification with fasttext is great, https://fasttext.cc/blog/2017/10/02/blog-post.html But the training process is not clear, I am wondering if for language identification, subword is used? or do we need to do word segmentation for some languages like Chinese, Japanese?