Use Word Embeddings for Multi-Lingual Downstream Task

My task is about multi-label text classification. Firstly, I use pretrained monolingual BERT for word embeddings for my model, it works fine with accuracy ~50% and top-3 accuracy ~80%. Now, I need more generalized word embeddings for transfer learning to quickly adapt to another language. I have tested mBERT and your pretrained models, and I found that the distribution of similarity between different languages from your model is better (my test was on sentence-level embeddings). But I need word-level embeddings to feed into my model. When I declare output_value='token_embeddings', my model works pretty bad, the best finetune is ~20% accuracy and ~40% top-3 accuracy. So my question is: "Is your pretrained model suitable to apply word embeddings for downstream task? If yes, which version should I use?" - I am using distiluse-v2. Thank you in advance

UKPLab / sentence-transformers

Use Word Embeddings for Multi-Lingual Downstream Task #856