Open pesmeriz opened 1 month ago
Hey @pesmeriz! 👋 I'm here to help you out with any issues or questions you have. If you need assistance, feel free to ask!
There is a known issue with the enforce_privacy
setting not working as expected. The problem is that the current implementation does not actually perform any data anonymization based on the enforce_privacy
setting. The enforce_privacy
property is defined and can be set, but there is no logic that uses this setting to modify or anonymize the data [1].
A similar issue was discussed in an open issue, which is now closed. The suggested temporary fix involves modifying the convert_df_to_csv
function in pandasai/helpers/dataframe_serializer.py
to avoid adding the dataframe details if enforce_privacy
is set to True
[2].
Here is the suggested temporary fix:
Replace the following code in convert_df_to_csv
:
# Add dataframe details
dataframe_info += f"\ndfs[{extras['index']}]:{df.rows_count}x{df.columns_count}\n{df.to_csv()}"
With this:
# TEMP FIX: Do not add dataframe details
df_without_sample_data = pd.DataFrame(columns=df.pandas_df.columns)
dataframe_info += f"\ndfs[{extras['index']}]:{df.rows_count}x{df.columns_count}\n{df_without_sample_data.to_csv()}"
This change will prevent the original dataframe details from being included when enforce_privacy
is set to True
.
To continue talking to Dosu, mention @dosu.
I tried the proposed solution and the problems remains.
Also, this should be made clearar on the docs, though. Otherwise people start using, find a bug (or worse, they don't), and then are left without a solution and time wasted.
System Info
OS version: MacOS Sequoia 15.0
My pyproject.toml
🐛 Describe the bug
Using
"enforce_privacy": True
does not anonimize the data. Even if you usecustomer_head
on yourSmartDataframe
, the Agent will always share the data within the original dataframe. My example:You can check this on
/pandasai/llm/bamboo_llm.py
line 18.