Open pranav-gupta-7 opened 4 days ago
Hi @pranav-gupta-7, thanks for the question.
Our thought process was that a smaller batch size will allow us to capture more performance peaks of the model. In contrast, a higher batch size would probably result in a more stable training process - but you might miss some performance peaks.
Hey, I found your work very interesting. I have a question regarding the batch size. Why did you choose a batch size of 1? How did the results compare when the batch size was greater than 1?