Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.36k stars 1.17k forks source link

Incorrect response when querying products used for entertainment purposes #668

Closed wubba8lubba closed 10 months ago

wubba8lubba commented 10 months ago

System Info

🐛 Describe the bug

Code:

import pandas as pd
from pandasai import SmartDataframe
from pandasai.llm.openai import OpenAI

product_data = [
    {'Product': 'Headphones', 'Units_Sold': 30, 'Price_Per_Unit': 50, 'Total_Sales': 1500},
    {'Product': 'Tablet', 'Units_Sold': 25, 'Price_Per_Unit': 200, 'Total_Sales': 5000},
    {'Product': 'Digital Camera', 'Units_Sold': 20, 'Price_Per_Unit': 250, 'Total_Sales': 5000},
    {'Product': 'Microwave Oven', 'Units_Sold': 35, 'Price_Per_Unit': 120, 'Total_Sales': 4200},
    {'Product': 'Blender', 'Units_Sold': 50, 'Price_Per_Unit': 60, 'Total_Sales': 3000},
    {'Product': 'Smartwatch', 'Units_Sold': 40, 'Price_Per_Unit': 100, 'Total_Sales': 4000},
    {'Product': 'Gaming Console', 'Units_Sold': 15, 'Price_Per_Unit': 300, 'Total_Sales': 4500},
    {'Product': 'Air Purifier', 'Units_Sold': 25, 'Price_Per_Unit': 150, 'Total_Sales': 3750},
    {'Product': 'Toaster', 'Units_Sold': 30, 'Price_Per_Unit': 40, 'Total_Sales': 1200},
    {'Product': 'Vacuum Cleaner', 'Units_Sold': 20, 'Price_Per_Unit': 180, 'Total_Sales': 3600},
    {'Product': 'Coffee Maker', 'Units_Sold': 30, 'Price_Per_Unit': 70, 'Total_Sales': 2100},
    {'Product': 'Dishwasher', 'Units_Sold': 18, 'Price_Per_Unit': 200, 'Total_Sales': 3600},
    {'Product': 'Bluetooth Speaker', 'Units_Sold': 40, 'Price_Per_Unit': 50, 'Total_Sales': 2000},
    {'Product': 'Hair Dryer', 'Units_Sold': 25, 'Price_Per_Unit': 30, 'Total_Sales': 750},
    {'Product': 'Rice Cooker', 'Units_Sold': 35, 'Price_Per_Unit': 80, 'Total_Sales': 2800},
    {'Product': 'Iron', 'Units_Sold': 30, 'Price_Per_Unit': 40, 'Total_Sales': 1200},
    {'Product': 'Blu-ray Player', 'Units_Sold': 15, 'Price_Per_Unit': 100, 'Total_Sales': 1500},
    {'Product': 'Portable Speaker', 'Units_Sold': 40, 'Price_Per_Unit': 70, 'Total_Sales': 2800},
    {'Product': 'Kettle', 'Units_Sold': 25, 'Price_Per_Unit': 30, 'Total_Sales': 750},
    {'Product': 'Handheld Vacuum', 'Units_Sold': 20, 'Price_Per_Unit': 60, 'Total_Sales': 1200},
    {'Product': 'Fitness Tracker', 'Units_Sold': 50, 'Price_Per_Unit': 80, 'Total_Sales': 4000},
    {'Product': 'Coffee Machine', 'Units_Sold': 25, 'Price_Per_Unit': 150, 'Total_Sales': 3750},
    {'Product': 'Desk Chair', 'Units_Sold': 30, 'Price_Per_Unit': 100, 'Total_Sales': 3000},
    {'Product': 'Smart Speaker', 'Units_Sold': 40, 'Price_Per_Unit': 70, 'Total_Sales': 2800},
    {'Product': 'Electric Toothbrush', 'Units_Sold': 35, 'Price_Per_Unit': 50, 'Total_Sales': 1750},
    {'Product': 'Car Vacuum Cleaner', 'Units_Sold': 20, 'Price_Per_Unit': 60, 'Total_Sales': 1200},
    {'Product': 'Wireless Earbuds', 'Units_Sold': 45, 'Price_Per_Unit': 90, 'Total_Sales': 4050},
    {'Product': 'External Hard Drive', 'Units_Sold': 15, 'Price_Per_Unit': 120, 'Total_Sales': 1800},
    {'Product': 'Air Fryer', 'Units_Sold': 25, 'Price_Per_Unit': 80, 'Total_Sales': 2000},
    {'Product': 'Robot Vacuum', 'Units_Sold': 30, 'Price_Per_Unit': 200, 'Total_Sales': 6000},
    {'Product': 'Electric Scooter', 'Units_Sold': 10, 'Price_Per_Unit': 300, 'Total_Sales': 3000},
    {'Product': 'Digital Thermometer', 'Units_Sold': 40, 'Price_Per_Unit': 20, 'Total_Sales': 800},
    {'Product': 'Hair Straightener', 'Units_Sold': 20, 'Price_Per_Unit': 50, 'Total_Sales': 1000},
    {'Product': 'Electric Kettle', 'Units_Sold': 25, 'Price_Per_Unit': 40, 'Total_Sales': 1000},
    {'Product': 'Smart Thermostat', 'Units_Sold': 15, 'Price_Per_Unit': 120, 'Total_Sales': 1800},
    {'Product': 'Wireless Mouse', 'Units_Sold': 35, 'Price_Per_Unit': 30, 'Total_Sales': 1050},
    {'Product': 'Portable Charger', 'Units_Sold': 30, 'Price_Per_Unit': 25, 'Total_Sales': 750},
    {'Product': 'Yoga Mat', 'Units_Sold': 20, 'Price_Per_Unit': 20, 'Total_Sales': 400},
    {'Product': 'Smart Lock', 'Units_Sold': 10, 'Price_Per_Unit': 150, 'Total_Sales': 1500},
    {'Product': 'Handheld Massager', 'Units_Sold': 25, 'Price_Per_Unit': 40, 'Total_Sales': 1000},
    {'Product': 'USB-C Hub', 'Units_Sold': 15, 'Price_Per_Unit': 30, 'Total_Sales': 450},
    {'Product': 'Playstation', 'Units_Sold': 20, 'Price_Per_Unit': 350, 'Total_Sales': 1000},
]

llm = OpenAI()
df = SmartDataframe(df=pd.DataFrame(product_data), config={"llm": llm})
response = df.clean_data().chat("Which of the products listed are primarily used for entertainment purposes?")
print(response)

Issue Summary:

When querying the products primarily used for entertainment purposes using the chat method, the current code is not providing the expected response. The expected response includes several products that are missing from the actual response.

Current Behavior: The current behavior returns the following products:

Expected Behavior: The expected behavior is to return the following products:

gventuri commented 10 months ago

Hi @ghostwhistles, PandasAI only analyzes data based on the information you can deduct from the table. What you are actually asking for is an info that could be deducted, but there is no field actually supporting that. You would get an actual response if you had, for example a boolean field like "user_for_gaming" or something similar. Note that PandasAI doesn't send the whole df to the LLM, so the LLM cannot determine whether each element is used primarily for gaming.