how much should be the accuracy of bert base cased on squad 2

Hello I finetuned the bert base cased on squad 2 with the following command: python run_squad.py \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=config.json \ --init_checkpoint=bert_model.ckpt \ --do_train=True \ --train_file=$SQUAD_DIR/train-v2.0.json \ --do_predict=True \ --predict_file=$SQUAD_DIR/dev-v2.0.json \ --train_batch_size=24 \ --learning_rate=3e-5 \ --num_train_epochs=2.0 \ --max_seq_length=384 \ --doc_stride=128 \ --output_dir=~/squad_large/ \ --version_2_with_negative=True \ --null_score_diff_threshold=-2 and get the following output from evaluate script: {"exact": 62.01465509980628, "f1": 64.47961013334715, "total": 11873, "HasAns_exact": 47.891363022941974, "HasAns_f1": 52.828341955672826, "HasAns_total": 5928, "NoAns_exact": 76.09756097560975, "NoAns_f1": 76.09756097560975, "NoAns_total": 5945, "best_exact": 62.0651899267245, "best_exact_thresh": -2.0197997093200684, "best_f1": 64.51742974575069, "best_f1_thresh": -2.0197997093200684, "pr_exact_ap": 31.64157011302471, "pr_f1_ap": 37.53953936447737, "pr_oracle_ap": 73.56376007315332} I assume the exact match should be higher(something around 73). Is there somthing that I check it with?

google-research / bert

how much should be the accuracy of bert base cased on squad 2 #1373