Closed gaoyixuan111 closed 4 months ago
Hi, @gaoyixuan111 The llava model is time-consuming to load. After loading, theoretically, I use one 3090 GPU and can complete the caption of a single image within 1.5 seconds. Check your code to make sure that the model is not loaded repeatedly in a loop.
@JackAILab Could you share the training time for the model on 8 V100 GPUs and provide more training details?
"When I use LLAVA to generate the corresponding captions, the speed is very slow, taking about one minute to complete the vqa_LLVA and vqa_LLVA_more_face_detail descriptions for a single image."