Open iboB opened 1 month ago
I guess the hardest part of this is dealing with the samplers (chain and grammar). They have a state on their own which is parallel and bound to the KV cache state. Backtracking is trivial in KV: just set a pointer. Not so much in samplers.
Allow backtracking in an instance by a given number of tokens