This pr introduces partial generation which allows users to generate from partially done assistant output. This is mainly helpful for constrained generation because currently, it's possible when prompting the AI with
"Say Action Input: 'hi' Action Input: "
for the output to not be 'hi' because of formatting for each language model where the above becomes an user prompt. To fix this issue I added a way to directly move part of the prompt after the formatting with a new parameter in request body partial_generation.
Additionally, I did some minor fixes so that the global variables are not ignored and the server can completely reset.
I tested locally and it worked
This pr introduces partial generation which allows users to generate from partially done assistant output. This is mainly helpful for constrained generation because currently, it's possible when prompting the AI with "Say Action Input: 'hi' Action Input: " for the output to not be 'hi' because of formatting for each language model where the above becomes an user prompt. To fix this issue I added a way to directly move part of the prompt after the formatting with a new parameter in request body partial_generation. Additionally, I did some minor fixes so that the global variables are not ignored and the server can completely reset. I tested locally and it worked