Open FairyFali opened 12 months ago
Hi,
I have replied to your email in case you haven't seen it.
You can refer to issue #38 and we use commonsense_evaluate.py
for evaluation. BTW, please try to use single GPU for training, multi-GPU training may not reproduce the results. We are still trying to figure out the reason.
If you have further questions, please let us know!
Hi,
I am encountering difficulties in reproducing the experimental results on the OpenbookQA dataset. The output format is unexpected; for instance, I'm getting responses like "1 is correct. 2 is incorrect. 3 is incorrect. 4 is incorrect.", whereas the anticipated format should be "answer1". Could you please provide a detailed command or set of instructions for both fine-tuning and evaluating the model, to enable accurate reproduction of the results on OpenbookQA?