abdeladim-s / pyllamacpp

Python bindings for llama.cpp
https://abdeladim-s.github.io/pyllamacpp/
MIT License
62 stars 21 forks source link

Using model.cpp_generate #26

Open monks1975 opened 1 year ago

monks1975 commented 1 year ago

I'm trying to use cpp_generate instead of generate so I can run a callback when generation completes, but cpp_generate complains about the anti_prompt attribute. I can't seem to run generation at all with cpp_generate, can anyone show me a working use case?

Here's where I am with model.generate. Replacing with cpp_generate fails. I tried both antiprompt and anti_prompt as docs show a difference

@app.post("/chat")
async def chat(request: ChatRequest):
    prompt = request.prompt

    global conversation_history
    conversation_history += request.conversation_history

    # Pass prompt and conversation_history to model
    full_prompt = conversation_history + "\n" + prompt

    def iter_tokens():
        for token in model.generate(
            prompt=full_prompt,
            antiprompt="Human:",
            n_threads=6,
            n_batch=1024,
            n_predict=256,
            n_keep=48,
            repeat_penalty=1.0,
        ):
            yield token.encode()

    return StreamingResponse(iter_tokens(), media_type="text/plain")
abdeladim-s commented 1 year ago

Hi @monks1975,

I have fixed the typo. Could you please give the latest commit a try and see if that fixed the issue ?