Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.51k stars 1.31k forks source link

How should I configure it so that it always uses the actual data instead of sample data? #1314

Closed HappyGO2023 closed 1 day ago

HappyGO2023 commented 3 months ago

sdf = SmartDataframe(df, config={ "llm": llm, "enable_cache": False, "use_error_correction_framework": True, "verbose": True, "enforce_privacy": False }) Currently, with this configuration, sometimes the generated code uses the dataframe source like # Assuming dfs is already declared and contains the dataframes df = dfs[0], while other times it uses sample data with the comment # Sample data provided in the dataframe format. This seems to be random. How should I configure it so that it always uses the actual data instead of sample data? image

dosubot[bot] commented 3 months ago

Hey there, @HappyGO2023! 👋 I'm here to help you with your coding questions and bug fixes while you wait for a human maintainer. I'm a bot designed to assist you with your technical issues.

To ensure that the SmartDataframe always uses the actual data instead of sample data, you need to set the custom_head parameter to None when initializing the SmartDataframe. This configuration ensures that the actual data is used for all operations.

Here's how you can modify your configuration:

sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "use_error_correction_framework": True,
    "verbose": True,
    "enforce_privacy": False
}, custom_head=None)

This will ensure that the actual data is always used instead of sample data [1][2].

To continue talking to Dosu, mention @dosu.

HappyGO2023 commented 3 months ago

Hey there, @HappyGO2023! 👋 I'm here to help you with your coding questions and bug fixes while you wait for a human maintainer. I'm a bot designed to assist you with your technical issues.

To ensure that the SmartDataframe always uses the actual data instead of sample data, you need to set the custom_head parameter to None when initializing the SmartDataframe. This configuration ensures that the actual data is used for all operations.

Here's how you can modify your configuration:

sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "use_error_correction_framework": True,
    "verbose": True,
    "enforce_privacy": False
}, custom_head=None)

This will ensure that the actual data is always used instead of sample data [1][2].

To continue talking to Dosu, mention @dosu.

I have modify my configuration like this, It still does not work.

dosubot[bot] commented 3 months ago

To ensure that the SmartDataframe always uses the actual data instead of sample data, you should ensure that the enforce_privacy is set to False and avoid setting the custom_head parameter. If you have already done this and it still does not work, there might be an internal logic or condition causing the switch.

Here's a refined configuration to try:

sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "use_error_correction_framework": True,
    "verbose": True,
    "enforce_privacy": False
})

If the issue persists, it might be due to an internal condition or a bug. Ensure that no other part of your code is modifying these settings after initialization. If the problem continues, it might be worth checking for any known issues or bugs in the current version of pandas-ai that could affect this behavior [1][2].

To continue talking to Dosu, mention @dosu.