Closed gDanzel closed 3 months ago
@gDanzel , can you check if the data it is sending is the same as the real data?
I tried to replicate the same. As you mentioned I also see, it passes 3 sample rows to the LLM. However, on thorough check, the values in those rows are randomized for me. The numerical values doesn't equal to the real values I have passed to the LLM. Maybe the way pandas-ai is enforcing privacy, is by randomizing the values when being passed to the LLM. Let me know your findings.
System Info
OS version: win11 Python version: 3.10 The current version of pandasai being used: v2.0.23
🐛 Describe the bug
According to the doc:
"enforce_privacy: whether to enforce privacy. Defaults to False. If set to True, PandasAI will not send any data to the LLM, but only the metadata. By default, PandasAI will send 5 samples that are anonymized to improve the accuracy of the results. "
But I tried below code, and seems it's sending the sample data as well.
You can find logs below indication prompts, having the data included.
QUERY
Calculate the average of the gdp of north american countries
Variable
dfs: list[pd.DataFrame]
is already declared.At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code: 2024-03-29 21:01:19 [INFO] Executing Step 3: CodeGenerator 2024-03-29 21:01:22 [INFO] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" 2024-03-29 21:01:22 [INFO] Prompt used: dfs[0]: name: null description: null type: pd.DataFrame rows: 10 columns: 3 schema: fields:
Update this initial code:
QUERY
Calculate the average of the gdp of north american countries
Variable
dfs: list[pd.DataFrame]
is already declared.At the end, declare "result" variable as a dictionary of type and value.
If you are asked to plot a chart, use "matplotlib" for charts, save as png.
Generate python code and return full updated code:
2024-03-29 21:01:22 [INFO] Code generated:
2024-03-29 21:01:22 [INFO] Executing Step 4: CachePopulation 2024-03-29 21:01:22 [INFO] Executing Step 5: CodeCleaning 2024-03-29 21:01:22 [INFO] Code running:
2024-03-29 21:01:22 [INFO] Executing Step 6: CodeExecution 2024-03-29 21:01:22 [INFO] Executing Step 7: ResultValidation 2024-03-29 21:01:22 [INFO] Answer: {'type': 'number', 'value': 10450942230528.0} 2024-03-29 21:01:22 [INFO] Executing Step 8: ResultParsing 10450942230528.0 Tokens Used: 510 Prompt Tokens: 359 Completion Tokens: 151 Total Cost (USD): $ 0.000406