Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.2k stars 1.28k forks source link

last_code_executed contains the sample data (dictionary). #1335

Open c3-yiminliu opened 2 months ago

c3-yiminliu commented 2 months ago

System Info

pandas ai 2.2.14

🐛 Describe the bug

last_code_executed often times contains the sample data as dictionary which is pretty long and makes the code less readable.

Sometimes the data variable is not used in the main code, in this case, is it possible to not include the data variable in last_code_executed to make it cleaner?

dosubot[bot] commented 2 months ago

Hey @c3-yiminliu! I'm here to help you with any bugs, questions, or contributions you have. I'm Dosu, a bot designed to assist you while awaiting a human maintainer.

Could you provide an example of the code where last_code_executed includes the sample data as a dictionary? This will help in understanding the context and providing a more accurate solution.

To continue talking to Dosu, mention @dosu.

c3-yiminliu commented 2 months ago

Sure, here is an example.

image

The example dataframe is fairly simple and the data variable is fairly concise. But in real examples, the data variable can be quite lengthy and make the code a bit hard to read.

dosubot[bot] commented 2 months ago

To address the issue of the data variable making the last_code_executed lengthy and less readable, you can modify the behavior of the SmartDataframe class to exclude the data variable from last_code_executed. Here's how you can do it:

  1. Override the last_code_executed Property: You can override the last_code_executed property in the SmartDataframe class to filter out the data variable.

Here's an example of how you can achieve this:

from pandasai import SmartDataframe

class CustomSmartDataframe(SmartDataframe):
    @property
    def last_code_executed(self):
        code = self._agent.last_code_executed
        # Modify the code to exclude the 'data' variable
        modified_code = code.replace('data', '')
        return modified_code

# Use CustomSmartDataframe instead of SmartDataframe
df = CustomSmartDataframe(your_dataframe, config=your_config)

This code snippet creates a subclass of SmartDataframe and overrides the last_code_executed property to remove occurrences of the data variable from the last executed code [1].

  1. Check for DataFrame Usage: Ensure that the data variable is not used in the main code before excluding it. The _required_dfs method in the CodeExecution class can help determine if the data variable is needed for execution [2].

By implementing these changes, you can make the last_code_executed cleaner and more readable without including unnecessary data.

To continue talking to Dosu, mention @dosu.