Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.73k stars 1.23k forks source link

plt.savefig() not getting called? #1112

Closed seanshanker closed 2 months ago

seanshanker commented 5 months ago

System Info

pandasai 2.0.32 python3 --version Python 3.9.16

Static hostname: ip-172-31-12-35.us-east-2.compute.internal Icon name: computer-vm Chassis: vm 🖴 Machine ID: 3ba8b1dfc7c743b3944799f4ebe159ec Boot ID: 854d6241cd2f4635ba5b7fec9a5025e5 Virtualization: xen Operating System: Amazon Linux 2023.4.20240401 CPE OS Name: cpe:2.3:o:amazon:amazon_linux:2023 Kernel: Linux 6.1.82-99.168.amzn2023.x86_64 Architecture: x86-64 Hardware Vendor: Xen Hardware Model: HVM domU Firmware Version: 4.11.amazon

🐛 Describe the bug

not saving image output of plot

2024-04-15 13:31:42 [INFO] Executing Step 4: CachePopulation 2024-04-15 13:31:42 [INFO] Executing Step 5: CodeCleaning 2024-04-15 13:31:42 [INFO] Saving charts to /usr/local/first/exports/charts/1f1d1578-c952-4719-a403-982e6342c21c.png 2024-04-15 13:31:42 [INFO] Code running:

data = {'Name': ['Alice', 'Charlie', 'David'], 'Age': [25, 30, 40], 'Salary': [50000, 70000, 80000]}
df = dfs[0]
plt.figure(figsize=(8, 6))
plt.bar(df['Name'], df['Salary'], color='skyblue')
plt.xlabel('Name')
plt.ylabel('Salary')
plt.title('Salaries of Employees')
plt.show()
result = {'type': 'plot', 'value': '/usr/local/first/exports/charts/1f1d1578-c952-4719-a403-982e6342c21c.png'}

2024-04-15 13:31:42 [INFO] Executing Step 6: CodeExecution 2024-04-15 13:31:42 [ERROR] Pipeline failed on step 6: [Errno 2] No such file or directory: '/usr/local/first/exports/charts/1f1d1578-c952-4719-a403-982e6342c21c.png'

seanshanker commented 5 months ago

problem1: reads data with an imaginary comma

problem 2: not sure if this is a hallucination issue. it thinks it thinks it saved a file but it doesnt and then has a hard time finding to render.

problem 3: does a bizarre unstack at times which DOES NOT work and then a df pivot which WORKS.

problem1 data = dfs[0] data['totalWeeklyShareQuantity'] = data['totalWeeklyShareQuantity'].str.replace(',', '').astype(int) grouped_data = data.groupby(['MPID', 'securitytype'])['totalWeeklyShareQuantity'].sum().unstack() plt.figure(figsize=(12, 6)) grouped_data.plot(kind='bar') plt.ylabel('totalWeeklyShareQuantity') plt.title('Total Weekly Share Quantity by MPID and Security Type') plt.show()

2024-04-17 12:51:42 [ERROR] Failed with error: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
    result = self.execute_code(input, code_context)
  File "/usr/local/lib/python3.9/site-packages/pandasai/pipelines/chat/code_execution.py", line 170, in execute_code
    exec(code, environment)
  File "<string>", line 2, in <module>
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/internals/managers.py", line 448, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/usr/local/lib64/python3.9/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
    return arr.astype(dtype, copy=True)
ValueError: invalid literal for int() with base 10: '38,000'

in my my input file - i dont have commas. so not sure what its trying to do here. 

totalWeeklyShareQuantity,totalWeeklyTradeCount,issueSymbolIdentifier,issueName,MPID,tierIdentifier,Tick
er,Industry,Sector,securitytype,50-day Moving Average,bin_50-day Moving Average
38000,169,SPTI,SPDR Portfolio Intermediate Term Treasury ETF,INCR,T1,SPTI,,,ETF,28.190799980163575,10-5
0

**problem 2**

2024-04-17 18:27:04 [INFO] Executing Step 4: CachePopulation
2024-04-17 18:27:04 [INFO] Executing Step 5: CodeCleaning
2024-04-17 18:27:04 [INFO] Saving charts to /usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png
2024-04-17 18:27:04 [INFO]
Code running:

data = dfs[0].groupby(['MPID', 'securitytype'])['totalWeeklyShareQuantity'].sum().unstack() data.plot(kind='bar', figsize=(12, 6), width=0.8) plt.xlabel('MPID') plt.ylabel('Total Weekly Share Quantity') plt.title('Total Weekly Share Quantity by MPID and Security Type') plt.legend(title='Security Type') plt.show() result = {'type': 'plot', 'value': '/usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png'}

2024-04-17 18:27:04 [INFO] Executing Step 6: CodeExecution
2024-04-17 18:27:04 [ERROR] Pipeline failed on step 6: [Errno 2] No such file or directory: '/usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png'
2024-04-17 18:27:05 [INFO] 100.8.116.94 - - [17/Apr/2024 18:27:05] "POST /chat HTTP/1.1" 200 -
2024-04-17 18:27:05 [INFO] 100.8.116.94 - - [17/Apr/2024 18:27:05] "^[[33mGET /Unfortunately,%20I%20was%20not%20able%20to%20answer%20your%20question,%20because%20of%20the%20following%20error:[Errno%202]%20No%20such%20file%20or%20directory:%20'/usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png' HTTP/1.1^[[0m" 404 -
2024-04-17 18:27:07 [INFO] 100.8.116.94 - - [17/Apr/2024 18:27:07] "GET / HTTP/1.1" 200 -
2024-04-17 18:27:08 [INFO] 100.8.116.94 - - [17/Apr/2024 18:27:08] "^[[36mGET /static/shanklogo HTTP/1.1^[[0m" 304 -
:$                                                                                                                               

**PROBLEM 3**

# Prepare the data for plotting
data = dfs[0].groupby(['MPID', 'securitytype'])['totalWeeklyShareQuantity'].sum().unstack()

# Plot the data side by side
data.plot(kind='bar', figsize=(12, 6), width=0.8)
plt.xlabel('MPID')
plt.ylabel('Total Weekly Share Quantity')
plt.title('Total Weekly Share Quantity by MPID and Security Type')
plt.legend(title='Security Type')
plt.show()

# Declare result variable 
result = {"type": "plot", "value": "temp_chart.png"}

2024-04-17 18:27:04 [INFO] Executing Step 4: CachePopulation 2024-04-17 18:27:04 [INFO] Executing Step 5: CodeCleaning 2024-04-17 18:27:04 [INFO] Saving charts to /usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png 2024-04-17 18:27:04 [INFO] Code running:

data = dfs[0].groupby(['MPID', 'securitytype'])['totalWeeklyShareQuantity'].sum().unstack()
data.plot(kind='bar', figsize=(12, 6), width=0.8)
plt.xlabel('MPID')
plt.ylabel('Total Weekly Share Quantity')
plt.title('Total Weekly Share Quantity by MPID and Security Type')
plt.legend(title='Security Type')
plt.show()
result = {'type': 'plot', 'value': '/usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png'}

2024-04-17 18:27:04 [INFO] Executing Step 6: CodeExecution 2024-04-17 18:27:04 [ERROR] Pipeline failed on step 6: [Errno 2] No such file or directory: '/usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png' 2024-04-17 18:27:05 [INFO] 100.8.116.94 - - [17/Apr/2024 18:27:05] "POST /chat HTTP/1.1" 200 - 2024-04-17 18:27:05 [INFO] 100.8.116.94 - - [17/Apr/2024 18:27:05] "^[[33mGET /Unfortunately,%20I%20was%20not%20able%20to%20answer%20your%20question,%20because%20of%20the%20following%20error:[Errno%202]%20No%20such%20file%20or%20directory:%20'/usr/local/first/static/ff56c1e8-d272-4298-b3fc-2c732b296144.png' HTTP/1.1^[[0m" 404 -