Open Haitons opened 3 years ago
It uses the mBERT (bert multilingual) code from Huggingface. I am not sure if segmentation is required for mBERT, I never used it for my experiments on chinese data and it worked well.
OK. I got 521-dimensional sentence vector. Is there any way to reduce the dimension?
Hello!I want to finetune your model distiluse-base-multilingual-cased on chinese corpus like LCQMC. So,do Chinese sentences need word segmentation?