Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.7k stars 1.08k forks source link

fix:Introduce `_is_python_code` check into `_extract_code` #1140

Closed adamingas closed 1 month ago

adamingas commented 2 months ago

Motivation: Using a dataframe with a large number of columns, the LLM()._execute_code method would fail when asking how many rows there are in the dataframe. The failure would be because the LLM wasn't using triple back ticks to surround the code.

If the LLM produces correct python code, but for some reason does not preface it with ``, theLLM()._extract_code` method fails with NoCodeFoundError.

In this change we first check if the separator is included in the response and we take only the part that is in-between. Then we clean the code from python/py statements. Then we check if it's valid code. This check should pass otherwise executing the code will fail, and it's better to raise an error earlier than later.

With this change, if the LLM produced perfect code without the ``` separator, it would pass and return the code, whereas previously it would fail.

tushar-31093 commented 1 month ago

Guys, this is a genuine concern, could you confirm if this fix might solve this problem?

adamingas commented 1 month ago

@gventuri Can someone take a look at the PR? Happy to change stuff if need be.

gventuri commented 1 month ago

@adamingas merged, thanks a lot for the PR :D