Open Dan-wanna-M opened 4 months ago
Some users might want to mask an array of token ids(e.g. from top_p, top_k) rather than the whole logits. We probably need the caller to provide an output buffer considering how the FFI works.
top_p
top_k
This will essentially stop cache from functioning. Probably should be implemented after eager regex cache.
Some users might want to mask an array of token ids(e.g. from
top_p
,top_k
) rather than the whole logits. We probably need the caller to provide an output buffer considering how the FFI works.