google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.24k stars 571 forks source link

albert large_v2 #197

Open gogokre opened 4 years ago

gogokre commented 4 years ago

If I use the albert base or large_v1 model, learning is good. However, if you use the large_v2 model in the same way, you will not learn. It doesn't even say that there is insufficient memory, but the accuracy does not rise. Why is that?

from transformers import AlbertForSequenceClassification, AdamW model = AlbertForSequenceClassification.from_pretrained(
"albert-large-v2", #albert-large-v1 num_labels = 2, # The number of output labels--2 for binary classification.
output_attentions = False, # Whether the model returns attentions weights. output_hidden_states = False, # Whether the model returns all hidden-states. ) model.cuda()

maziyarpanahi commented 4 years ago

I've noticed similar behavior between albert_base/1 and albert_base/3. While training NER with albert_base/1 is good (enough), albert_base/3 is terrible! The V1 starts from 86% and goes to 89% but the V3 starts from 56% and stays there.

Akshayextreme commented 4 years ago

Even I am facing same issue. I have used ALBERT-base V2 for my task and I got ~70% DEV accuracy. But when I am trying to train the same model using ALBERT-large V2, the accuracy is always less than 10%. Quite strange!