I'm trying to use cpp_generate instead of generate so I can run a callback when generation completes, but cpp_generate complains about the anti_prompt attribute. I can't seem to run generation at all with cpp_generate, can anyone show me a working use case?
Here's where I am with model.generate. Replacing with cpp_generate fails. I tried both antiprompt and anti_prompt as docs show a difference
@app.post("/chat")
async def chat(request: ChatRequest):
prompt = request.prompt
global conversation_history
conversation_history += request.conversation_history
# Pass prompt and conversation_history to model
full_prompt = conversation_history + "\n" + prompt
def iter_tokens():
for token in model.generate(
prompt=full_prompt,
antiprompt="Human:",
n_threads=6,
n_batch=1024,
n_predict=256,
n_keep=48,
repeat_penalty=1.0,
):
yield token.encode()
return StreamingResponse(iter_tokens(), media_type="text/plain")
I'm trying to use cpp_generate instead of generate so I can run a callback when generation completes, but cpp_generate complains about the anti_prompt attribute. I can't seem to run generation at all with cpp_generate, can anyone show me a working use case?
Here's where I am with model.generate. Replacing with cpp_generate fails. I tried both antiprompt and anti_prompt as docs show a difference