Open HariWu1995 opened 3 years ago
The model was not really trained for word level embeddings. Also the distiluse model uses a dense 768x512 layer on top of the pooling layer, that might skew the word embeddings. Maybe the other multi lingual models work better
My task is about multi-label text classification. Firstly, I use pretrained monolingual BERT for word embeddings for my model, it works fine with accuracy ~50% and top-3 accuracy ~80%. Now, I need more generalized word embeddings for transfer learning to quickly adapt to another language. I have tested mBERT and your pretrained models, and I found that the distribution of similarity between different languages from your model is better (my test was on sentence-level embeddings). But I need word-level embeddings to feed into my model. When I declare output_value='token_embeddings', my model works pretty bad, the best finetune is ~20% accuracy and ~40% top-3 accuracy. So my question is: "Is your pretrained model suitable to apply word embeddings for downstream task? If yes, which version should I use?" - I am using distiluse-v2. Thank you in advance