NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.85k stars 891 forks source link

Fail to reproduce Post Training Quantization F1-score of BERT Base #152

Closed zhuango closed 3 years ago

zhuango commented 3 years ago

I recently run post training quantization (PTQ) on BERT Base model for SQuADv1.1 task and fail to achieve the f1-score released at this page.

My script for training:

export BERT_BASE_DIR=./uncased_L-12_H-768_A-12
export SQUAD_DIR=./SQuAD1.1
export OUTPUT=./weights/squad1.1_fast_trans_384_float

mpirun -np 4 \
    --allow-run-as-root -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO \
    -x LD_LIBRARY_PATH \
    -x PATH -mca pml ob1 -mca btl ^openib \
    python run_squad.py \
    --vocab_file=$BERT_BASE_DIR/vocab.txt \
    --bert_config_file=$BERT_BASE_DIR/bert_config.json \
    --train_file=$SQUAD_DIR/train-v1.1.json \
    --predict_file=$SQUAD_DIR/dev-v1.1.json \
    --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
    --do_train=True \
    --do_predict=True \
    --if_quant=False \
    --train_batch_size=8 \
    --learning_rate=1e-5 \
    --num_train_epochs=2.0 \
    --save_checkpoints_steps 1000 \
    --output_dir=$OUTPUT \
    --horovod

python ./SQuAD1.1/evaluate-v1.1.py \
./SQuAD1.1/dev-v1.1.json \
$OUTPUT/predictions.json

And script for post training quantization :

export BERT_BASE_DIR=./uncased_L-12_H-768_A-12
export SQUAD_DIR=./SQuAD1.1
export OUTPUT=./weights/squad1.1_fast_trans_384_int8_ptq

python run_squad.py \
    --vocab_file=$BERT_BASE_DIR/vocab.txt \
    --bert_config_file=$BERT_BASE_DIR/bert_config.json \
    --train_file=$SQUAD_DIR/train-v1.1.json \
    --predict_file=$SQUAD_DIR/dev-v1.1.json \
    --init_checkpoint=./weights/squad1.1_fast_trans_384_float/model.ckpt-5474 \
    --do_train=False \
    --do_predict=True \
    --do_calib=True \
    --if_quant=True \
    --train_batch_size=16 \
    --calib_batch=16 \
    --calib_method=percentile \
    --percentile=99.999 \
    --quant_mode=ft2 \
    --output_dir=$OUTPUT \

python ./SQuAD1.1/evaluate-v1.1.py \
./SQuAD1.1/dev-v1.1.json \
$OUTPUT/predictions.json

PTQ F1-score I got: 78.08% (88.65% fp32, seq_length=384) Released F1-score: 88.30% (89.57% fp32, seq_length=384)

Could you guys help? Thanks~

byshiue commented 3 years ago

What BERT pre-trained model you use? Do you download the model by the guide?

wget https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip -O uncased_L-12_H-768_A-12.zip
zhuango commented 3 years ago

@byshiue Thanks, I will try this pre-trained model.

zhuango commented 3 years ago

I replaced the pre-trained model and achieved following f1-scores. 89.38% (FP32, seq_length=384) 87.90% (PTQ, ft2, seq_length=384) 88.69% (PTQ, ft1, seq_length=384) which seems good.

I wonder what is the difference between the model you provide and the model from Google BERT repo

byshiue commented 3 years ago

We also encounter same issue, but we don't have more idea.

zhuango commented 3 years ago

@byshiue Thanks very much~