Hello
I finetuned the bert base cased on squad 2 with the following command:
python run_squad.py \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=config.json \
--init_checkpoint=bert_model.ckpt \
--do_train=True \
--train_file=$SQUAD_DIR/train-v2.0.json \
--do_predict=True \
--predict_file=$SQUAD_DIR/dev-v2.0.json \
--train_batch_size=24 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=~/squad_large/ \
--version_2_with_negative=True \
--null_score_diff_threshold=-2
and get the following output from evaluate script:
{"exact": 62.01465509980628, "f1": 64.47961013334715, "total": 11873, "HasAns_exact": 47.891363022941974, "HasAns_f1": 52.828341955672826, "HasAns_total": 5928, "NoAns_exact": 76.09756097560975, "NoAns_f1": 76.09756097560975, "NoAns_total": 5945, "best_exact": 62.0651899267245, "best_exact_thresh": -2.0197997093200684, "best_f1": 64.51742974575069, "best_f1_thresh": -2.0197997093200684, "pr_exact_ap": 31.64157011302471, "pr_f1_ap": 37.53953936447737, "pr_oracle_ap": 73.56376007315332}
I assume the exact match should be higher(something around 73). Is there somthing that I check it with?
Hello I finetuned the bert base cased on squad 2 with the following command: python run_squad.py \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=config.json \ --init_checkpoint=bert_model.ckpt \ --do_train=True \ --train_file=$SQUAD_DIR/train-v2.0.json \ --do_predict=True \ --predict_file=$SQUAD_DIR/dev-v2.0.json \ --train_batch_size=24 \ --learning_rate=3e-5 \ --num_train_epochs=2.0 \ --max_seq_length=384 \ --doc_stride=128 \ --output_dir=~/squad_large/ \ --version_2_with_negative=True \ --null_score_diff_threshold=-2 and get the following output from evaluate script: {"exact": 62.01465509980628, "f1": 64.47961013334715, "total": 11873, "HasAns_exact": 47.891363022941974, "HasAns_f1": 52.828341955672826, "HasAns_total": 5928, "NoAns_exact": 76.09756097560975, "NoAns_f1": 76.09756097560975, "NoAns_total": 5945, "best_exact": 62.0651899267245, "best_exact_thresh": -2.0197997093200684, "best_f1": 64.51742974575069, "best_f1_thresh": -2.0197997093200684, "pr_exact_ap": 31.64157011302471, "pr_f1_ap": 37.53953936447737, "pr_oracle_ap": 73.56376007315332} I assume the exact match should be higher(something around 73). Is there somthing that I check it with?