NVIDIA / NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Other
4.24k stars 403 forks source link

RunnableRails performance weirdness #494

Open pechaut78 opened 6 months ago

pechaut78 commented 6 months ago

res =guardrails.invoke({"input":"How do I cook meat"}) 0.5s

I'm defining a chain, not using it ! the llm is local, while the llm in the yml file is openAI

chain = print_func|(guardrails |llm)| print_func|extract_output res =guardrails.invoke({"input":"How do I cook meat"}) 35s and the answer is incorrect

chain.invoke(...) 35s same answer

Restart:

res =guardrails.invoke({"input":"How do I cook meat"}) 0.5s

chain = print_func | llm | print_func | extract_output chain2 = guardrails | llm

res =guardrails.invoke({"input":"How do I cook meat"}) 35s incorrect result

Note: it looks like as soon as I attach the runnable to a chain, it triggers the local llm even if not using it

pechaut78 commented 6 months ago

nota:

guardrails = RunnableRails(config=config, passthrough=False)

pechaut78 commented 6 months ago

Nota:

If i do: chain = print_func|llm| print_func|extract_output chain2 = guardrails |chain

no slowdown ..

chain = print_func|llm| print_func|extract_output chain2 = guardrails |llm

slowdown..

drazvan commented 6 months ago

Hi @pechaut78! You are correct that the behavior is weird. This is because the "|" operator actually mutates the RunnableRails instance (https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/integrations/langchain/runnable_rails.py#L86). This is a bug, it should create a new instance. We'll fix this for the next release. Thanks for reporting!