Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.7k stars 1.23k forks source link

Agent not working as expected #1158

Open fletchsims opened 4 months ago

fletchsims commented 4 months ago

System Info

Python version: 3.11 PandasAI: 2.0.40 OS: macOS Ventura 13.6.4

🐛 Describe the bug

I'm using the following code from the documentation found on the official docs about using the Agent. I've modified it slightly by changing "deals_opened" value for France from 70 to 180 and modifying the queries slightly:

Input:

import os

import pandas as pd
from dotenv import load_dotenv
from pandasai import Agent
from pandasai.llm import OpenAI

def get_env(key: str):
    load_dotenv()
    return os.getenv(key)

sales_by_country = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia",
                "Japan", "China"],
    "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000],
    "deals_opened": [142, 80, 180, 90, 60, 50, 40, 30, 110, 120],
    "deals_closed": [120, 70, 60, 80, 50, 40, 30, 20, 100, 110]
})

# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
openai = OpenAI(api_token=get_env('OPENAI_API_KEY'))

agent = Agent(sales_by_country, config={"llm": openai})
print(agent.chat('Which are the top 5 countries by sales?'))
print(agent.chat('And which country has the most deals?'))

Expected Output:

The top 5 countries by sales are: China, United States, Japan, Germany, United Kingdom
The country with the most deals is United States.

Actual Output:

The top 5 countries by sales are: China, United States, Japan, Germany, United Kingdom
The country with the most deals is France.

From what I understand about the Agent, is that additional queries after the first one should be "follow-up" queries if they're found to be relevant by the LLM. What is happening is that it is defaulting the original data and performing the search that way, not the new subset of data from the first query.

manojdighe commented 4 months ago

Same experience here. Not sure if pandasai is creating a fresh thread every time a new chat request is made on the agent.

DrDavidL commented 1 month ago

Agree - simple followup questions to assess the memory are not answered correctly. For example, what is the mean BMI, then followed up by return twice the prior value doesn't make sense, nor does the generated explanation.