Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.07k stars 1.26k forks source link

Unexpected results when generating graphs #1402

Open Freddeb opened 1 day ago

Freddeb commented 1 day ago

System Info

System info: pandasai version 2.2.15, Windows 10, Python 3.12.2, Azure OpenAI model gpt-4o-mini

Hi, I'm testing the library (By the way, great work!). I would like please to have your feedback on some tests that I've done. My test consists to read an Excel file with sales data (this file comes from internet and it's a free sample).

Here are the questions that I asked to the library and the output results: Test 1: in_user_prompt="For the country Belgium, plot a graph with the total revenue on Y axis and with the ship date on X axis. Draw a line with the red color."

Test 2 in_user_prompt="For the country Albania, plot a graph with the total revenue on Y axis and with the ship date on X axis. Draw a line with the blue color."

Test 3 in_user_prompt="For the country Belgium and Albania, plot a graph with the total revenue on Y axis with the ship date on X axis in the ship date period between 2011 and 2016. For Albania draw the line with the color blue, for Belgium use the color red."

I've joined my script and the Excel file to this thread. sample_sales_1000.xlsx excel_test.txt

Thank you for your feedback. Fred

🐛 Describe the bug

Test 1 results: image

Test 2 results: image Remark: The graph displays a blue dot for each measure (the dots do not appear in the first graph - Test 1).

Test 3 results: image Remark: The graph displays a discontinued blue line for the country Albania without a dot for each measure. The country Belgium has a small segment (probably because the period covers between 2011 and 2016).

dosubot[bot] commented 1 day ago

Hey @Freddeb! 👋 I'm here to help you out with any bugs or questions you have. Let's squash those bugs together!

It seems like you're experiencing some inconsistencies with the graph outputs when using pandas-ai to plot data from an Excel file. Here are a few points to consider that might help address these issues:

  1. Color Assignment: Ensure that the library is correctly interpreting the color specifications in your prompts. If the colors are not being applied as expected, it might be due to how the library processes these instructions.

  2. Data Point Markers: The presence or absence of data point markers (dots) in your graphs could be due to default settings or how the library interprets the prompt. Unfortunately, I couldn't find specific information on how pandas-ai handles data point markers or their default settings.

  3. Date Range and Data Availability: For Test 3, the limited data for Belgium might be due to the specified date range (2011-2016) not having enough data points. Ensure that the data in your Excel file covers this period adequately for both countries.

  4. Graph Discontinuity: The discontinuity in the line for Albania in Test 3 might be due to missing data points or how the library handles gaps in data. Check the data for any missing values or gaps that could cause this issue.

If these suggestions don't resolve the issues, you might need to delve deeper into the library's documentation or source code to understand how it processes these specific graphing instructions.

To continue talking to Dosu, mention @dosu.