CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
148 stars 6 forks source link

Unable to reproduce benchmark results on LLaVA-v1.5-7b #11

Closed ZQ1102118381 closed 6 days ago

ZQ1102118381 commented 2 weeks ago

Hi,

Thank you for your nice work.

I have a trouble in replicating the benchmark result like this:

benchmarks

We trained with 8 A100, However, the evaluation results on other datasets seem reasonable.

I used the same details as in the provided code.

typ1012 commented 2 weeks ago

We encountered the same problem.

LiWentomng commented 2 weeks ago

@ZQ1102118381 @typ1012 Thanks for pointing out this issue. The current code combines the patch slicing and tokenpacker, which may lead to some errors for the only original tokenpacker. We will carefully review the code.

LiWentomng commented 1 week ago

@ZQ1102118381 @typ1012 We carefully review our codes, and everything appears to be well. We re-train models with the same codes, we noticed some fluctuations in performance. We listed the model performance we have re-trained for reference.

image

Additionally, we have provided a branch that only contains TokenPacker as a projector based on LLaVA-1.5. This branch does not include patch slicing scheme for TokenPacker-HD.