Closed bspmpa closed 8 months ago
In this framework, the prompt tokens are sensitive to the learning rate, so I think you may not have adjusted the learning rate for the prompt token while modifying the batch_size.
And the relationship between batch size and precision is not positively correlated, we experimented on most of the dataset is also bacth size of 16 when the best results
Thanks for your reply. I have adjusted the clip_lr, from 4e-3 to 8e-3. The acc is only around 79% (vs 83% in paper). I agree with your experiment findings on batchsize, although this is a little bit baffling. Is it because we are fine-tuning the large model so smaller batchsize is better?
Understandably, the learning process of prompt seems to be related to the bacth size, and higher batch size may be harmful to its learning, but we have not run any experiments on this.
But it should be possible to reproduce the results in the article on bacth size of 16
yes, the results are close but has 0.5% gap. that is why I want to do some parameter fine-tuning.
Hi, when reproducing the performance of the PA100k of your work, I try to use 24, 32, 64 as the batch size. But the performance is lower than the ones with 16 as the batch size. I am wondering can you shed some light on this issue. I assume for large dataset such as PA100k, a larger batch size would be benefical.