NVIDIA / NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Other
4.2k stars 399 forks source link

bug: Rails taking More time to excuate #831

Open Deepaksd29 opened 3 weeks ago

Deepaksd29 commented 3 weeks ago

Did you check docs and existing issues?

Python version (python --version)

Python 3.12.0

Operating system/version

Linux

NeMo-Guardrails version (if you must use a specific version and not the latest

0.9.1.1

Describe the bug

We’ve observed that the response time of our AI Health Chatbot currently takes around 15-16 seconds per response, which affects user experience and engagement. To improve efficiency and deliver a faster, more responsive interaction, we propose implementing NeMo Guardrails to optimize the flow and enhance the chatbot's performance. Problem Statement: Current Response Time: 15-16 seconds per interaction, which is significantly impacting the user experience. Objective: Reduce response time while maintaining the accuracy and quality of the bot’s responses. Impact: Slow response times may lead to user frustration, drop-offs, and lower engagement. Image Image

Code: how I implement the Nemo guardrails

llm = ChatBedrock(model_id="anthropic.claude-3-haiku-20240307-v1:0",
                  streaming=False,
                  region_name="us-east-1",
                  model_kwargs={"max_tokens": 500,
                                "temperature": 0.2,
                                "top_k": 250,
                                "top_p": 0.5,
                                "stop_sequences": ["\n\nHuman"]},)

nest_asyncio.apply()
config = RailsConfig.from_path("./config")
guardrails = RunnableRails(config=config, llm=llm, input_key="input", output_key="answer")

 answer_prompt = ChatPromptTemplate.from_messages([
            ("system", changed_prompt),
            self.few_shot_prompt,
            MessagesPlaceholder(variable_name="chat_history"),
            ("user", "{input}"),
        ])`          

        Chat_history = self.LLMConnection.redis_api.get_from_redis(ref_id)
        Chat_history = [serialize_message(msg) for msg in Chat_history]

        document_chain = create_stuff_documents_chain(llm_connection.llm, answer_prompt)
        conversational_retrieval_chain = create_retrieval_chain(history_retriever_chain, document_chain)
        rag_chain_with_guardrails = guardrails | conversational_retrieval_chain

        response = rag_chain_with_guardrails.invoke({"chat_history": Chat_history, "input": input_txt})

Steps To Reproduce

Issue caused in above code.

Expected Behavior

Need to reduce rails execution time.

Actual Behavior

Rails execution time was too high