Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.68k stars 1.23k forks source link

Not getting he correct answer #1206

Closed vijayproxima closed 2 weeks ago

vijayproxima commented 3 months ago

System Info

I have installed the latest version of pandasai last week. Machine - Ubuntu 22.04 RAM 32 GB. using openai. I have a table with list of holidays for the year 2024 for 4 state [county] of INDIA. the columns are 'Date', "day', 'Festival', 'Region'. When I ask a Question - "When is the next holiday in Pune? Here Pune is a city which falls under one of the states mentioned in the csv file. So LLM should interpret that the holiday date should be picked from the listed State [to which Pune belongs]- This works well with Assistants AI. HOwever, using pandasai, it is not working. Since it explicitly tries to find the 'Pune' in the list, it is not able to return the answer.

🐛 Describe the bug

here are the logs:

dfs[0]:39x5 No,Date,Day,Occasion/Festival,Location/State 17,01/05/24,Monday,Good Friday,Telangana 3,19/02/24,Wednesday,Diwali Amavasya (Laxmi Pujan),Rest of India 6,29/03/24,Thursday,Independence Day,Karnataka

The user asked the following question:

QUERY

when is the next holiday in Pune?

You generated this python code: pune_holidays = dfs[0][dfs[0]['Location/State'] == 'Pune'] pune_holidays['Date'] = pd.to_datetime(pune_holidays['Date'], format='%d/%m/%y') pune_holidays = pune_holidays.sort_values(by='Date') next_holiday = pune_holidays[pune_holidays['Date'] > pd.Timestamp.now()].iloc[0] next_holiday_date = next_holiday['Date'].strftime('%d/%m/%y') result = {'type': 'string', 'value': f"The next holiday in Pune is on {next_holiday_date}, which is {next_holiday['Occasion/Festival']}."} 2024-06-05 16:24:30 [INFO] Code generated:

            import pandas as pd

pune_holidays = dfs[0][dfs[0]['Location/State'] == 'Pune']
pune_holidays['Date'] = pd.to_datetime(pune_holidays['Date'], format='%d/%m/%y')
pune_holidays = pune_holidays.sort_values(by='Date')

next_holiday = pune_holidays[pune_holidays['Date'] > pd.Timestamp.now()]

if not next_holiday.empty:
    next_holiday = next_holiday.iloc[0]
    next_holiday_date = next_holiday['Date'].strftime('%d/%m/%y')
    result = {'type': 'string', 'value': f"The next holiday in Pune is on {next_holiday_date}, which is {next_holiday['Occasion/Festival']}."}
else:
    result = {'type': 'string', 'value': "There are no upcoming holidays in Pune."}

result

2024-06-05 16:24:30 [INFO] Executing Step 2: CodeCleaning 2024-06-05 16:24:30 [INFO] Code running:

pune_holidays = dfs[0][dfs[0]['Location/State'] == 'Pune']
pune_holidays['Date'] = pd.to_datetime(pune_holidays['Date'], format='%d/%m/%y')
pune_holidays = pune_holidays.sort_values(by='Date')
next_holiday = pune_holidays[pune_holidays['Date'] > pd.Timestamp.now()]
if not next_holiday.empty:
    next_holiday = next_holiday.iloc[0]
    next_holiday_date = next_holiday['Date'].strftime('%d/%m/%y')
    result = {'type': 'string', 'value': f"The next holiday in Pune is on {next_holiday_date}, which is {next_holiday['Occasion/Festival']}."}
else:
    result = {'type': 'string', 'value': 'There are no upcoming holidays in Pune.'}
result

2024-06-05 16:24:30 [INFO] Executing Step 7: ResultValidation 2024-06-05 16:24:30 [INFO] Answer: {'type': 'string', 'value': 'There are no upcoming holidays in Pune.'} 2024-06-05 16:24:30 [INFO] Executing Step 8: ResultParsing