[ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset?

google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Apache License 2.0

3.23k stars 569 forks source link

[ALBERT]Has anyone reproduced ALBERT a scores on GLUE dataset? #99

Closed lonePatient closed 4 years ago

lonePatient commented 4 years ago

I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.

model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1

kamalkraj commented 4 years ago

https://github.com/kamalkraj/ALBERT-TF2.0 [WIP] got better accuracy on dev set CoLA.

wxp16 commented 4 years ago

I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.

model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1

On the MNLI dataset, using the 'ALBERT' base v1, I got the following results. Clearly, the accuracy is very low.

eval_accuracy = 0.77962303
eval_loss = 0.5517804
global_step = 24543
loss = 0.5517709

kamalkraj commented 4 years ago

https://github.com/kamalkraj/ALBERT-TF2.0 [WIP] got better accuracy on dev set CoLA.

Dataset: MNLI Model: ALBERT large v1 Dev accuracy : 0.8089 epochs : 3 max_seq_length : 128 batch_size: 128 learning_rate : 3e-5

lonePatient commented 4 years ago

https://github.com/lonePatient/albert_pytorch

Dataset: MNLI Model: ALBERT_BASE_V2 Dev accuracy : 0.8418

kamalkraj commented 4 years ago

@lonePatient Could your share the Hyperparameters? Max seq length ?

lonePatient commented 4 years ago

@kamalkraj
--max_seq_length=128 \ --per_gpu_train_batch_size=16 \ --per_gpu_eval_batch_size=16 \ --spm_model_file=${BERT_BASE_DIR}/30k-clean.model \ --learning_rate=1e-5 \ --num_train_epochs=3.0 \ --logging_steps=24544 \ --save_steps=24544 \

kamalkraj commented 4 years ago

@lonePatient Dropouts ? All 0 ?

lonePatient commented 4 years ago

@kamalkraj 。fine-tuning， dropout rate=0.1