Open bexxnaz opened 1 month ago
Good question! Can you try to use a small batch? From my previous experience, the results are similar.
Besides, if you want to use gradient accumulation, I think the multiple modalities in an iterator
will be a good baseline, but some papers argue that single modalities in an iterator
will be better. If you want to realize it, a simple strategy is to split the input data manually.
Thank you for your response. I have another question regarding the extra_num_query_tokens. Specifically, I'm interested in understanding if you've tested the scenario where this parameter is set to 0. How does this compression of visual tokens affect performance?
There is an ablation in our paper. And 0 extra query lead to poorer performance on MVBench
Hello Thanks for your great work. I need to use gradient accumulation on batches due to RAM constraints. The training loop involves iterating over two modalities. I am concerned about the implications of using gradient accumulation in this scenario. Is it possible and recommended to use gradient accumulation with multiple modalities in an iterator?