bug: Rails taking More time to excuate

Did you check docs and existing issues?

[x] I have read all the NeMo-Guardrails docs
[x] I have updated the package to the latest version before submitting this issue
[ ] (optional) I have used the develop branch
[x] I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

Python 3.12.0

Operating system/version

Linux

NeMo-Guardrails version (if you must use a specific version and not the latest

0.9.1.1

Describe the bug

We’ve observed that the response time of our AI Health Chatbot currently takes around 15-16 seconds per response, which affects user experience and engagement. To improve efficiency and deliver a faster, more responsive interaction, we propose implementing NeMo Guardrails to optimize the flow and enhance the chatbot's performance. Problem Statement: Current Response Time: 15-16 seconds per interaction, which is significantly impacting the user experience. Objective: Reduce response time while maintaining the accuracy and quality of the bot’s responses. Impact: Slow response times may lead to user frustration, drop-offs, and lower engagement.

Code: how I implement the Nemo guardrails

llm = ChatBedrock(model_id="anthropic.claude-3-haiku-20240307-v1:0",
                  streaming=False,
                  region_name="us-east-1",
                  model_kwargs={"max_tokens": 500,
                                "temperature": 0.2,
                                "top_k": 250,
                                "top_p": 0.5,
                                "stop_sequences": ["\n\nHuman"]},)

nest_asyncio.apply()
config = RailsConfig.from_path("./config")
guardrails = RunnableRails(config=config, llm=llm, input_key="input", output_key="answer")

 answer_prompt = ChatPromptTemplate.from_messages([
            ("system", changed_prompt),
            self.few_shot_prompt,
            MessagesPlaceholder(variable_name="chat_history"),
            ("user", "{input}"),
        ])`          

        Chat_history = self.LLMConnection.redis_api.get_from_redis(ref_id)
        Chat_history = [serialize_message(msg) for msg in Chat_history]

        document_chain = create_stuff_documents_chain(llm_connection.llm, answer_prompt)
        conversational_retrieval_chain = create_retrieval_chain(history_retriever_chain, document_chain)
        rag_chain_with_guardrails = guardrails | conversational_retrieval_chain

        response = rag_chain_with_guardrails.invoke({"chat_history": Chat_history, "input": input_txt})

Steps To Reproduce

Issue caused in above code.

Expected Behavior

Need to reduce rails execution time.

Actual Behavior

Rails execution time was too high

NVIDIA / NeMo-Guardrails