fix:Introduce `_is_python_code` check into `_extract_code`

adamingas commented 2 months ago

Add tests to cover more cases for _extract_code and _is_python_code
[X] Addresses #1128
[X] Tests added and passed if fixing a bug or adding a new feature.
[X] All code checks passed.

Motivation: Using a dataframe with a large number of columns, the LLM()._execute_code method would fail when asking how many rows there are in the dataframe. The failure would be because the LLM wasn't using triple back ticks to surround the code.

If the LLM produces correct python code, but for some reason does not preface it with ``, theLLM()._extract_code` method fails with NoCodeFoundError.

In this change we first check if the separator is included in the response and we take only the part that is in-between. Then we clean the code from python/py statements. Then we check if it's valid code. This check should pass otherwise executing the code will fail, and it's better to raise an error earlier than later.

With this change, if the LLM produced perfect code without the ``` separator, it would pass and return the code, whereas previously it would fail.

tushar-31093 commented 1 month ago

Guys, this is a genuine concern, could you confirm if this fix might solve this problem?

adamingas commented 1 month ago

@gventuri Can someone take a look at the PR? Happy to change stuff if need be.

gventuri commented 1 month ago

@adamingas merged, thanks a lot for the PR :D

Sinaptik-AI / pandas-ai

fix:Introduce `_is_python_code` check into `_extract_code` #1140