Open james20141606 opened 2 months ago
Great question! This could be the reason! For REC tasks like this item, the data is directly borrowed from LLaVA-1.5. I think the reason for LLaVA to do this concatenation is to save training time. But when breaking down the conversations, I guess it will definitely bring benefits for REC task!
Thanks for your comment! I agree it might be benificial but it is indeed too time consuming. BTW i guess change epoch_num from 1 to larger number might have the chance to increase the performance a little bit? Not sure if it is worthwhile to do so.
Yeah You can increase epoch. But there is a chance of overfitting to the model. What I mean is that the generalization issue may arise.
Question
Hi, When I try to look into the json file for finetuning, I find things like:
I wonder if it is better to split this question to multiple questions so each question contain only one bounding box to make model less puzzled? Although it will increase the training load a lot, could it somehow make the model understand the bounding box better?