RifleZhang / LLaVA-Hound-DPO

122 stars 18 forks source link

WARNING: tokenization mismatch: 2 vs. 89. (ignored) #9

Closed Liuziyu77 closed 2 months ago

Liuziyu77 commented 3 months ago

When I am trying to use multi-images data to tune a llava-v1.5-7B, it reports "WARNING: tokenization mismatch: 2 vs. 89. (ignored)". That's why?

RifleZhang commented 3 months ago

I think the error happens because of tokenization error. I would suggest checking:

  1. The version of transformers (we used transformers==4.36.2)
  2. The selection of conversation template
  3. In your training data, make sure to have special token like

Maybe you can start with our released data for debugging, and then move forward with customized data.

Let me know if you still encounter the same error.