Open iambankaratharva opened 1 year ago
Hello @iambankaratharva it could be that your learning rate is too high for XLM-RoBERTa-Large. This model is really large, so we typically use a much smaller learning rate around 5e-6.
Also, we recommend to use the fine_tune
method as illustrated in the script here.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Question
Hi, I have data in BIO format (not BIOES). I am training a sequence tagger model with transformer embedding but consistently get 0 f1-score for every epoch for XLM-ROBERTA-LARGE, but for other models (BERT-BASE-UNCASED) I'm getting a non-zero F-1 score. Could you please help me understand the reason? I can confirm that the loss was decreasing consistently. Code for XLM-ROBERTA-LARGE below:
Training data snapshot: