Closed lonePatient closed 4 years ago
https://github.com/kamalkraj/ALBERT-TF2.0 [WIP] got better accuracy on dev set CoLA.
I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.
model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1
On the MNLI
dataset, using the 'ALBERT' base v1, I got the following results. Clearly, the accuracy is very low.
eval_accuracy = 0.77962303
eval_loss = 0.5517804
global_step = 24543
loss = 0.5517709
https://github.com/kamalkraj/ALBERT-TF2.0 [WIP] got better accuracy on dev set CoLA.
Dataset: MNLI Model: ALBERT large v1 Dev accuracy : 0.8089 epochs : 3 max_seq_length : 128 batch_size: 128 learning_rate : 3e-5
https://github.com/lonePatient/albert_pytorch
Dataset: MNLI Model: ALBERT_BASE_V2 Dev accuracy : 0.8418
@lonePatient Could your share the Hyperparameters? Max seq length ?
@kamalkraj
--max_seq_length=128 \
--per_gpu_train_batch_size=16 \
--per_gpu_eval_batch_size=16 \
--spm_model_file=${BERT_BASE_DIR}/30k-clean.model \
--learning_rate=1e-5 \
--num_train_epochs=3.0 \
--logging_steps=24544 \
--save_steps=24544 \
@lonePatient Dropouts ? All 0 ?
@kamalkraj 。fine-tuning, dropout rate=0.1
I convert tf weight to pytorch weight ,and on QQP dataset, I only get 87% accuracy.
model: albert-base epochs: 3 learning_rate; 2e-5 batch size: 24 max sequence length: 128 warmup_proportion: 0.1