Closed Gumpest closed 5 months ago
Besides, I want to learn how to compute the CUDA time. Thanks a lot.
@Yxxxb What batch size do you use?
The overall batch size is 128, which is the same as LLaVA SFT stage. You could check the training Hyperparameters in the "Additional Implement Details" section of our paper's appendix.
Hi, authors. I wonder how to present the efficiency via inference time.