epfl-dlab / transformers-CFG

🤗 A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers
http://saibo-creator.xyz:7860/
MIT License
90 stars 16 forks source link

Support masking when embedding size is different from vocab size #83

Closed x0wllaar closed 2 months ago

x0wllaar commented 2 months ago

In certain models, like Phi3 or LLaVA-NEXT, the model embedding size is larger then the tokenizer vocab size. This is probably for optimizations on certain GPUs.

There's some discussion about this in #34, but the solution there is not automatic, and requires changing the model embedding size. I'm not sure how compatible it is.

This patch detects the mismatch on inference, and fills the missing part of the mask with False, allowing it to be applied to model logits. In my tests, it worked well with llava-hf/llava-v1.6-mistral-7b-hf.

Saibo-creator commented 2 months ago

@x0wllaar Thank you for the PR !