Error when using Flask, Gemini models(gemini-1.5-pro-001), and NemoGuardRails integrated with each other.

Description: I'm encountering an error while running a Flask application that integrates Gemini models with NemoGuardRails. The error seems to be related to asynchronous task handling within LLMRails.

Run below code:

colang_content = """
# define limits
define user ask politics
    "what are your political beliefs?"
    "thoughts on the president?"
    "left wing"
    "right wing"

define bot answer politics
    "I'm a shopping assistant, I don't like to talk of politics."

define flow politics
    user ask politics
    bot answer politics
    bot offer help

# here we use the chatbot for anything else
define flow
    user ...
    $answer = execute custom_agent(user_message=$last_user_message)

    bot $answer
"""

yaml_content = """
models:
    - type: main
      engine: vertexai
      model: gemini-1.5-pro-001
"""

from nemoguardrails import LLMRails, RailsConfig
from google.cloud import aiplatform
import os
from flask import Flask, request

app = Flask(__name__)

#vertexAI authentication
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'GOOGLE_APPLICATION_CREDENTIALS.json'
PROJECT_ID = "PROJECT_ID"
REGION = "us-central1"
aiplatform.init(project=PROJECT_ID, location=REGION)

class ModerationRails():
    def __init__(self):
        rail_config = RailsConfig.from_content(
            colang_content=colang_content,
            yaml_content=yaml_content
            )
        self.app = LLMRails(rail_config, verbose=False)
        print(self.app.llm)

    def custom_agent(self, user_message):
        return "Code is working fine ---- Output is : " + user_message

    def run(self, user_message):
        self.app.register_action(self.custom_agent, name="custom_agent")
        self.app.register_action_param("user_message", user_message)
        bot_message = self.app.generate(prompt=user_message)
        return bot_message

moderation_rails = ModerationRails()

@app.route('/qna', methods=['GET', 'POST'])
def home():
    if request.method == 'POST':
        input_string = request.form['question']
        length_of_string = moderation_rails.run(input_string)

        return length_of_string
    return '''
        else loop
    '''

if __name__ == '__main__':
    app.run(debug=False)

Error Message:


mylap@mypc-MacBook-Pro accelate % /usr/bin/python3 /Users/mylap/Desktop/accelate/nemo_issue.py
Fetching 7 files: 100%|█████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 19227.33it/s]
VertexAI
Params: {'model_name': 'gemini-1.5-pro-001', 'candidate_count': 1}
 * Serving Flask app 'nemo_issue' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit

Synchronous action `custom_agent` has been called.
127.0.0.1 - - [22/Jun/2024 20:01:55] "POST /qna HTTP/1.1" 200 -

Error while execution generate_user_intent: Task <Task pending name='Task-16' coro=<LLMRails.generate_async() running at /Users/mylap/Library/Python/3.8/lib/python/site-packages/nemoguardrails/rails/llm/llmrails.py:639> cb=[_run_until_complete_cb() at /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py:184]> got Future <Task pending name='Task-20' coro=<UnaryUnaryCall._invoke() running at /Users/mylap/Library/Python/3.8/lib/python/site-packages/grpc/aio/_call.py:568>> attached to a different loop
127.0.0.1 - - [22/Jun/2024 20:02:14] "POST /qna HTTP/1.1" 200 -

Error while execution generate_user_intent: Task <Task pending name='Task-24' coro=<LLMRails.generate_async() running at /Users/mylap/Library/Python/3.8/lib/python/site-packages/nemoguardrails/rails/llm/llmrails.py:639> cb=[_run_until_complete_cb() at /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py:184]> got Future <Task pending name='Task-28' coro=<UnaryUnaryCall._invoke() running at /Users/mylap/Library/Python/3.8/lib/python/site-packages/grpc/aio/_call.py:568>> attached to a different loop
127.0.0.1 - - [22/Jun/2024 20:02:15] "POST /qna HTTP/1.1" 200 -

Steps to Reproduce:

Set up a Flask application with the provided code.
Ensure integration with Gemini models and NemoGuardRails.
Run the application.
Send a POST request to http://127.0.0.1:5000/qna with the form data containing the key question and a sample question as its value.
First request will not give any error and will get a response.
On making the Second request it will give above error.

After that ever request will show error.

curl --location 'http://127.0.0.1:5000/qna' \
--form 'question="how are you"'

Python version: 3.8.9

Name: Flask Version: 2.1.3

Name: nemoguardrails Version: 0.9.0

Additional Context:

If we use openAI models there will be no error like above. Replace yaml_content in above code with below one to use openai model:

yaml_content = """
models:
- type: main
  engine: openai
  model: gpt-3.5-turbo
"""

Note: This error only occurs when using Flask and NemoGuardRails integrated with Gemini models. There is no error if we use only two of above three.

No Error for Flask + NemoGuardRails
No Error for Flask + Gemini models
No Error for Gemini models + NemoGuardRails

Any guidance or solution to fix this issue would be greatly appreciated.

NVIDIA / NeMo-Guardrails

Error when using Flask, Gemini models(gemini-1.5-pro-001), and NemoGuardRails integrated with each other. #573