Avoid computing the full mask at each step, lazy computing is better

epfl-dlab / transformers-CFG

🤗 A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers

http://saibo-creator.xyz:7860/

MIT License

83 stars 15 forks source link

Avoid computing the full mask at each step, lazy computing is better #69

Open Saibo-creator opened 2 months ago

Saibo-creator commented 2 months ago

Currently the implementation computes the full next-allowed-token mask at each decoding step according to the grammar and the prefix. However, in many cases, the model's most likely token is probably valid so computing the entire mask is unnecessary.

Instead, iterately validate tokens with highest likelihood until finding the first match would probably be better.

This would need some refactoring in the logit processor class.

nathanrchn commented 3 days ago

starting working on this for this week.