Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.55k stars 1.21k forks source link

How should I configure it so that it always uses the actual data instead of sample data? #1314

Open HappyGO2023 opened 1 month ago

HappyGO2023 commented 1 month ago

sdf = SmartDataframe(df, config={ "llm": llm, "enable_cache": False, "use_error_correction_framework": True, "verbose": True, "enforce_privacy": False }) Currently, with this configuration, sometimes the generated code uses the dataframe source like # Assuming dfs is already declared and contains the dataframes df = dfs[0], while other times it uses sample data with the comment # Sample data provided in the dataframe format. This seems to be random. How should I configure it so that it always uses the actual data instead of sample data? image

dosubot[bot] commented 1 month ago

Hey there, @HappyGO2023! 👋 I'm here to help you with your coding questions and bug fixes while you wait for a human maintainer. I'm a bot designed to assist you with your technical issues.

To ensure that the SmartDataframe always uses the actual data instead of sample data, you need to set the custom_head parameter to None when initializing the SmartDataframe. This configuration ensures that the actual data is used for all operations.

Here's how you can modify your configuration:

sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "use_error_correction_framework": True,
    "verbose": True,
    "enforce_privacy": False
}, custom_head=None)

This will ensure that the actual data is always used instead of sample data [1][2].

To continue talking to Dosu, mention @dosu.

HappyGO2023 commented 1 month ago

Hey there, @HappyGO2023! 👋 I'm here to help you with your coding questions and bug fixes while you wait for a human maintainer. I'm a bot designed to assist you with your technical issues.

To ensure that the SmartDataframe always uses the actual data instead of sample data, you need to set the custom_head parameter to None when initializing the SmartDataframe. This configuration ensures that the actual data is used for all operations.

Here's how you can modify your configuration:

sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "use_error_correction_framework": True,
    "verbose": True,
    "enforce_privacy": False
}, custom_head=None)

This will ensure that the actual data is always used instead of sample data [1][2].

To continue talking to Dosu, mention @dosu.

I have modify my configuration like this, It still does not work.

dosubot[bot] commented 1 month ago

To ensure that the SmartDataframe always uses the actual data instead of sample data, you should ensure that the enforce_privacy is set to False and avoid setting the custom_head parameter. If you have already done this and it still does not work, there might be an internal logic or condition causing the switch.

Here's a refined configuration to try:

sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "use_error_correction_framework": True,
    "verbose": True,
    "enforce_privacy": False
})

If the issue persists, it might be due to an internal condition or a bug. Ensure that no other part of your code is modifying these settings after initialization. If the problem continues, it might be worth checking for any known issues or bugs in the current version of pandas-ai that could affect this behavior [1][2].

To continue talking to Dosu, mention @dosu.