elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

WIP: Constrained sampling based on EBNF grammars #354

Open seanmor5 opened 4 months ago

seanmor5 commented 4 months ago

Matches the LlamaCPP behavior. I finished the EBNF parser which encodes the grammar in the same way as the implementation from: https://github.com/huggingface/transformers/pull/27557

Unfortunately I think we may have to refactor or redesign much of the logic for processing acceptances in a way that is compatible with XLA. I know we can mimic a stack like data-structure (which I started to do), and I believe we can mimic the trie as well as containers. The issue I'm having is how possible it is to implement something like https://github.com/huggingface/transformers/pull/27557/files#diff-b7135bf8eda80faf271e4c9588eae893ebad019d2508df2f0afbe5b7ad5bbf4eR389 in a non-recursive way. Unless my understanding is incorrect and the actual incremental grammar acceptance process is the fixed depending on grammar and we can "compile" the acceptance up front