I am very interested in your research, but I encountered some problems when replicating the results of the paper. When I was replicating the experiment of direct fine-tuning on downstream tasks, I used the default parameter settings in the code, but the accuracy on the Scienceqa dataset was only about 47%. What could be the issue here? What were the parameter settings you used when achieving the results presented in the paper?
I am very interested in your research, but I encountered some problems when replicating the results of the paper. When I was replicating the experiment of direct fine-tuning on downstream tasks, I used the default parameter settings in the code, but the accuracy on the Scienceqa dataset was only about 47%. What could be the issue here? What were the parameter settings you used when achieving the results presented in the paper?