Open Tu-Zhenzhao opened 1 month ago
To address the issue with Chinese characters not being properly decoded during the prompt generation stage in pandasai, it's crucial to understand the encoding of your original data source and ensure it's UTF-8 or compatible. Additionally, knowing the Python version you're using and whether this issue occurs with other non-ASCII characters can help pinpoint the problem. A minimal, reproducible example of the code causing this issue would also be invaluable for finding a solution. This information will enable a more targeted approach to solving the decoding issue with Chinese characters.
To continue the conversation, mention @dosu.
System Info
pandasai == 2.0.43 python == 3.11
🐛 Describe the bug
I was trying to use Field Descriptions feature to improve the understanding of my dataset to LLMs. The way I am doing is write a data description function to create a dictionary info about dataset then pass then to pandasai through Field Descriptions like this:
My part of
data
looks like this:As you can see there is some Chinese characters, but in the prompt_generation stage, the Chinese characters was not decoded thus it looks like this:
Which makes LLM much more confused "\u65F6\u95F4".
Is any way we solve this problem? Any suggestion will be grateful!