Draft for any document update & code modification during replication

Hi, I opened this PR to discuss issues&questions that I met during replication and make corresponding modification.

Here are two questions first:

I was replicating training part. Do you have a record of the checkpoint you eventually used for each model? I replicated fine-tuning based on bert-base-uncased using the parameters stated in train.sh, however, it seems there isn't a checkpoint that match rsvp-ai/bertserini-bert-base-squad exactly. I suppose the rsvp-ai/bertserini-bert-large-squad is fine-tuned from bert-large-uncased-whole-word-masking? I tried to replicate this too (using parameters stated in train.sh), my last checkpoint gives (0.5, {'exact_match': 42.100283822138124, 'f1': 49.63275436249586, 'recall': 51.23401819043994, 'precision': 50.18438555675959, 'cover': 47.38883632923368, 'overlap': 57.994323557237465})
In train.sh file, the model_name_or_path used bert-large-uncased-whole-word-masking-finetuned-squad. Is any of the two rsvp-ai/<bert_squad>model fine-tuned from bert-large-uncased-whole-word-masking-finetuned-squad? Isn't it fine-tuned already based on bert-large-uncased-whole-word-masking? I evaluated the bert-large-uncased-whole-word-masking-finetuned-squad model, and added the results in evaluation part. (0.5, {'exact_match': 43.65184484389783, 'f1': 50.942504639546485, 'recall': 52.32886737510793, 'precision': 51.54318623526059, 'cover': 48.57142857142857, 'overlap': 58.63765373699149})

castorini / bertserini