Support masking when embedding size is different from vocab size

In certain models, like Phi3 or LLaVA-NEXT, the model embedding size is larger then the tokenizer vocab size. This is probably for optimizations on certain GPUs.

There's some discussion about this in #34, but the solution there is not automatic, and requires changing the model embedding size. I'm not sure how compatible it is.

This patch detects the mismatch on inference, and fills the missing part of the mask with False, allowing it to be applied to model logits. In my tests, it worked well with llava-hf/llava-v1.6-mistral-7b-hf.

epfl-dlab / transformers-CFG

Support masking when embedding size is different from vocab size #83