Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.54k stars 1.21k forks source link

Code Generated & Code Cleaning is not correct #1324

Open fazri02 opened 1 month ago

fazri02 commented 1 month ago

System Info

Version: 2.2.14

Eliminates import library code when the code appears with lots of spaces in front of it even though the code is needed

🐛 Describe the bug

2024-08-08 17:22:49 [INFO] Code generated:


"""            import seaborn as sns"""
import matplotlib.pyplot as plt

df = dfs[0][dfs[0]['year'] == 2023]
df = df.groupby(['direct_cause_category', 'equipment', 'plant'])['loss_pvc'].sum().reset_index()
plt.figure(figsize=(15, 10))
sns.barplot(data=df, x='direct_cause_category', y='loss_pvc', hue='equipment', palette='Set3')
plt.xticks(rotation=90)
plt.title('Total Loss by Direct Cause, Equipment, and Plant in 2023')
plt.xlabel('Direct Cause Category')
plt.ylabel('Loss (PVC)')
plt.legend(title='Equipment')
plt.tight_layout()
plt.savefig('exports/charts/46f15da9-d530-4955-ab0b-d0d564046f9e.png')
result = {'type': 'plot', 'value': 'exports/charts/46f15da9-d530-4955-ab0b-d0d564046f9e.png'}

2024-08-08 17:22:49 [ERROR] Failed with error: Traceback (most recent call last):
  File "/root/virtualenv/Streamlit-ai/lib64/python3.9/site-packages/pandasai/pipelines/chat/code_execution.py", line 85, in execute
    result = self.execute_code(code_to_run, code_context)
  File "/root/virtualenv/Streamlit-ai/lib64/python3.9/site-packages/pandasai/pipelines/chat/code_execution.py", line 171, in execute_code
    exec(code, environment)
  File "<string>", line 4, in <module>
NameError: name 'sns' is not defined
dosubot[bot] commented 1 month ago

@fazri02 Hello! I'm here to help you with any issues or questions you have. Let's work together to solve this!

The issue you're encountering is due to the import statement for the seaborn library being improperly formatted or commented out, leading to a NameError because sns is not defined. To resolve this, ensure that the import statement is correctly included and not commented out in your code.

Here's how you can fix it:

  1. Ensure the import statement for seaborn is correctly formatted and placed at the beginning of your script:
import seaborn as sns
import matplotlib.pyplot as plt
  1. If you are modifying the test_code_execution.py file, make sure the import statement is added at the top of the file:
import os
from typing import Optional
from unittest.mock import MagicMock, Mock, patch

import pandas as pd
import pytest
import seaborn as sns  # Add this line
import matplotlib.pyplot as plt

from pandasai.agent import Agent
from pandasai.exceptions import InvalidOutputValueMismatch, NoCodeFoundError
from pandasai.helpers.logger import Logger
from pandasai.helpers.optional import get_environment
from pandasai.helpers.skills_manager import SkillsManager
from pandasai.llm.fake import FakeLLM
from pandasai.pipelines.chat.code_execution import CodeExecution
from pandasai.pipelines.pipeline_context import PipelineContext

This will ensure that sns is defined and available for use in your code, preventing the NameError [1].

Additionally, the CodeCleaning class in the pandasai/pipelines/chat/code_cleaning.py file is responsible for cleaning the code and handling import statements. The _check_imports method ensures that only whitelisted imports are allowed. If an import is not whitelisted, it raises a BadImportError [2].

To continue talking to Dosu, mention @dosu.

fazri02 commented 1 month ago

Dear @dosu my case the code is generated by LLM, what your recommendation ?

dosubot[bot] commented 1 month ago

To ensure that the import statements generated by the LLM are correctly included and not commented out, you can use the _polish_code method from the LLM class. This method processes the code to ensure that it is in the correct format and not commented out.

Here is an example of how you can use the _polish_code method to ensure that the import statements are correctly included:

from pandasai.llm import LLM

# Example code generated by the LLM
generated_code = "import seaborn as sns\nprint('Hello World')"

# Use the _polish_code method to process the code
polished_code = LLM()._polish_code(generated_code)

# Execute the polished code
exec(polished_code)

The _polish_code method will ensure that the import statements are not commented out, thus avoiding issues like the NameError for sns [1].

Additionally, you can configure the LLM to influence how it generates import statements by using the custom_whitelisted_dependencies setting in the config object. This allows you to specify which dependencies should be whitelisted and their corresponding aliases:

config = {
    "custom_whitelisted_dependencies": {
        "seaborn": "sns",
        "matplotlib.pyplot": "plt"
    }
}

df = SmartDataframe(sales_by_country, config=config)

By configuring the LLM and its options, you can have more control over how the LLM generates code, including import statements [2][3].

To continue talking to Dosu, mention @dosu.

fazri02 commented 1 month ago

i can't put this config, always return error = 1 validation error for Config custom_whitelisted_dependencies -> 0 str type expected (type=type_error.str)

config = { "custom_whitelisted_dependencies": { "seaborn": "sns", "matplotlib.pyplot": "plt" } }

dosubot[bot] commented 1 month ago

To resolve the validation error and correctly configure the LLM to whitelist the dependencies "seaborn" as "sns" and "matplotlib.pyplot" as "plt" using the custom_whitelisted_dependencies setting, you need to modify the configuration to use a list of module names as strings. Here's how you can do it:

from pandasai import SmartDataframe

config = {
    "custom_whitelisted_dependencies": ["seaborn", "matplotlib.pyplot"]
}

# Assuming you have a DataFrame `df`
df = SmartDataframe(df, config=config)

This configuration will whitelist the specified dependencies. Ensure that these modules are installed in the environment where PandasAI is running to avoid any errors. Additionally, make sure that other relevant settings in your configuration file, such as llm and llm_options, are correctly set according to your requirements [1][2].

To continue talking to Dosu, mention @dosu.

fazri02 commented 1 month ago

how to add import seaborn as sns

dosubot[bot] commented 1 month ago

To add the import statement import seaborn as sns to your generated code, you can manually insert it at the beginning of the code string before executing it. Here's an example of how you can do this:

generated_code = """
import seaborn as sns
import matplotlib.pyplot as plt

df = dfs[0][dfs[0]['year'] == 2023]
df = df.groupby(['direct_cause_category', 'equipment', 'plant'])['loss_pvc'].sum().reset_index()
plt.figure(figsize=(15, 10))
sns.barplot(data=df, x='direct_cause_category', y='loss_pvc', hue='equipment', palette='Set3')
plt.xticks(rotation=90)
plt.title('Total Loss by Direct Cause, Equipment, and Plant in 2023')
plt.xlabel('Direct Cause Category')
plt.ylabel('Loss (PVC)')
plt.legend(title='Equipment')
plt.tight_layout()
plt.savefig('exports/charts/46f15da9-d530-4955-ab0b-d0d564046f9e.png')
result = {'type': 'plot', 'value': 'exports/charts/46f15da9-d530-4955-ab0b-d0d564046f9e.png'}
"""

# Execute the code
exec(generated_code)

This ensures that the necessary import statements are included and properly formatted, preventing the NameError for sns.

To continue talking to Dosu, mention @dosu.