Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.7k stars 1.08k forks source link

'NoneType' object has no attribute 'split' #1098

Closed seanshanker closed 2 months ago

seanshanker commented 2 months ago

System Info

System: python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] executable: /usr/bin/python3 machine: Linux-6.1.58+-x86_64-with-glibc2.35

Python dependencies: sklearn: 1.2.2 pip: 23.1.2 setuptools: 67.7.2 numpy: 1.25.2 scipy: 1.11.4 Cython: 3.0.10 pandas: 1.5.3 matplotlib: 3.7.1 joblib: 1.3.2 threadpoolctl: 3.4.0

Built with OpenMP: True

threadpoolctl info: user_api: blas internal_api: openblas num_threads: 2 prefix: libopenblas filepath: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so version: 0.3.23.dev threading_layer: pthreads architecture: Haswell

   user_api: openmp

internal_api: openmp num_threads: 2 prefix: libgomp filepath: /usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0 version: None

   user_api: blas

internal_api: openblas num_threads: 2 prefix: libopenblas filepath: /usr/local/lib/python3.10/dist-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so version: 0.3.21.dev threading_layer: pthreads architecture: Ha

🐛 Describe the bug

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/generate_chat_pipeline.py", line 283, in run output = (self.code_generation_pipeline | self.code_execution_pipeline).run( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 137, in run raise e File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 101, in run step_output = logic.execute( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/code_generator.py", line 33, in execute code = pipeline_context.config.llm.generate_code(input, pipeline_context) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 197, in generate_code return self._extract_code(response) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 118, in _extract_code if len(code.split(separator)) > 1: AttributeError: 'NoneType' object has no attribute 'split'

rjaswal9 commented 2 months ago

Running into the same error on my end, using the databricks connector.

seanshanker commented 2 months ago

Was working until last night.

gventuri commented 2 months ago

Can you share the logs? @rjaswal9 @seanshanker

seanshanker commented 2 months ago

i had the logs pasted above...here again

File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/generate_chat_pipeline.py", line 283, in run output = (self.code_generation_pipeline | self.code_execution_pipeline).run( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 137, in run raise e File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 101, in run step_output = logic.execute( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/code_generator.py", line 33, in execute code = pipeline_context.config.llm.generate_code(input, pipeline_context) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 197, in generate_code return self._extract_code(response) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 118, in _extract_code if len(code.split(separator)) > 1: AttributeError: 'NoneType' object has no attribute 'split'

melvinmt commented 2 months ago

I think there's an issue with BambooLLM, it works if you switch to OpenAI:

from pandasai import SmartDataframe
from pandasai.llm import OpenAI

llm = OpenAI(api_token="my-openai-api-key")
pandas_ai = SmartDataframe("data.csv", config={"llm": llm})
seanshanker commented 2 months ago

openAI doesnt allow you to develop and test given the constraints on how many queries you can run. BambooLLM at least gives you that little time and space to learn without dropping money.

gventuri commented 2 months ago

@seanshanker I'd need the more detailed logs (i.e. running with verbose=True). As an alternative, can you share the agent.last_code_generated or sdf.last_code_generated depending on which kind of feature you are using?

@melvinmt this is actually weird. Are you sure it's using BambooLLM and not OpenAI? If so, can you raise a ticket, will look into it ASAP!

seanshanker commented 2 months ago

all i have - script and logs

import os from pandasai import SmartDataframe import pandas as pd

pandas dataframe

sales_by_country = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000] })

By default, unless you choose a different LLM, it will use BambooLLM.

You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

os.environ["PANDASAI_API_KEY"] = "Nqhzu"

convert to SmartDataframe

sdf = SmartDataframe(sales_by_country,config={"conversational": True, "verbose": True})

response = sdf.chat('Which are the top 5 countries by sales?') print(response)

Output: China, United States, Japan, Germany, Australia


ERROR:pandasai.helpers.logger:Pipeline failed on step 3: 'NoneType' object has no attribute 'split' Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/generate_chat_pipeline.py", line 283, in run output = (self.code_generation_pipeline | self.code_execution_pipeline).run( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 137, in run raise e File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 101, in run step_output = logic.execute( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/code_generator.py", line 33, in execute code = pipeline_context.config.llm.generate_code(input, pipeline_context) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 197, in generate_code return self._extract_code(response) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 118, in _extract_code if len(code.split(separator)) > 1: AttributeError: 'NoneType' object has no attribute 'split' Unfortunately, I was not able to answer your question, because of the following error:

'NoneType' object has no attribute 'split'

seanshanker commented 2 months ago

when i use googlepalm it times out

HTTPConnectionPool(host='localhost', port=42017): Read timed out. (read timeout=60.0)

bamboollm works much better than googlepalm - so thats not much of an option either

seanshanker commented 2 months ago

how can i call agent.last_code_generated please? print it or its a function? ty

ERROR:pandasai.helpers.logger:Pipeline failed on step 3: 'NoneType' object has no attribute 'split' Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/generate_chat_pipeline.py", line 283, in run output = (self.code_generation_pipeline | self.code_execution_pipeline).run( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 137, in run raise e File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/pipeline.py", line 101, in run step_output = logic.execute( File "/usr/local/lib/python3.10/dist-packages/pandasai/pipelines/chat/code_generator.py", line 33, in execute code = pipeline_context.config.llm.generate_code(input, pipeline_context) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 197, in generate_code return self._extract_code(response) File "/usr/local/lib/python3.10/dist-packages/pandasai/llm/base.py", line 118, in _extract_code if len(code.split(separator)) > 1: AttributeError: 'NoneType' object has no attribute 'split'

TypeError Traceback (most recent call last) in <cell line: 32>() 30 31 agent.chat("Plot salaries against name") ---> 32 agent.last_code_generated()

TypeError: 'NoneType' object is not callable

seanshanker commented 2 months ago

this does look like a bamboollm issue. any update please? thank you.

huaji23 commented 2 months ago

+1

credelosa2022 commented 2 months ago

+1

gventuri commented 2 months ago

@seanshanker you need to print agent.last_code_generated. We need to check the code being generated to help!

credelosa2022 commented 2 months ago

Hi @gventuri, it says None for last_code_generated image

gventuri commented 2 months ago

Which llm are you using @credelosa2022? Did you declare the API key?

credelosa2022 commented 2 months ago

Hi @gventuri , everything is default so I'm guessing this is Bamboo LLM. Yup, I declared the API key which I got from pandabi.ai

seanshanker commented 2 months ago

Same. Null for agent.code_generated. This was working Monday evening est. something changed Monday evening into Tuesday morning. When was the last code checked in please?

seanshanker commented 2 months ago

anyone use googlepalm for reasonably modest dataset? i keep getting timedout. didnt use to happen with bamboollm.

HTTPConnectionPool(host='localhost', port=33723): Read timed out. (read timeout=60.0)

credelosa2022 commented 2 months ago

is googlepalm free @seanshanker ? have you tried using the loan dataset they provided in their examples?

seanshanker commented 2 months ago

small datasets work. anything slightly bigger times out. thought google gives 60 queries or something per min...may be i am misunderstanding something?

credelosa2022 commented 2 months ago

Can you send your code here using googlepalm on a small dataset? Thanks.

seanshanker commented 2 months ago

import pandas as pd from pandasai import SmartDataframe from pandasai.llm import GoogleVertexAI

from pandasai import SmartDataframe from pandasai.llm import GooglePalm from IPython.display import display, HTML

URL of the text file

url = "https://www.nasdaqtrader.com/dynamic/SymDir/nasdaqlisted.txt"

Read the data into a DataFrame

df = pd.read_csv(url, delimiter="|")

llm = GooglePalm(api_key="xxx")

df1 = SmartDataframe(df, config={"llm": llm}) response = df1.chat("Show me top 3 rows in dataframe where ETF is Y") display(HTML(response.to_html()))

credelosa2022 commented 2 months ago

Thanks @seanshanker.

Hopefully @gventuri can help us with our issue hehe

seanshanker commented 2 months ago

i sure hope so. stalled since tuesday morning! and i wanted to demo this to someone!

credelosa2022 commented 2 months ago

Same here!

seanshanker commented 2 months ago

anyone use gemini with pandasai and know if its supported please? Thank you!

rshoaib1190 commented 2 months ago

I face the same issue using BambooLLM. And the output for "sdf.last_code_generated" is None

wenger9 commented 2 months ago

Just want to echo what others have mentioned here.

last code generated shows as 'None'. My first attempt with this came Tuesday midday (EST).

I have had the same error using the dbx connector, and also with the first examples listed in the readme.

gventuri commented 2 months ago

Ok a few questions:

Please let's try to keep the conversation ordered, so that we can move faster.

In the meanwhile, as you keep experiencing this issue, please consider downgrading temporarily.

credelosa2022 commented 2 months ago

image

pandasai-2.0.32 BambooLLM (default) "How many loans are from men and have been paid off?" loans payment data.csv (from your examples in github)

@gventuri what version is working right now? Thanks.

seanshanker commented 2 months ago

using bamboollm

!pip install --force-reinstall -v "pandasai==2.0.29"

went lower - still no luck ...issue not with pandasai. issue with bamboollm

JmovJerry commented 2 months ago

@gventuri You can reproduce the issue on this colab example.

https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing

DaniloCardace commented 2 months ago

+1

gventuri commented 2 months ago

Hey everyone, the issue has finally be fixed, thanks a lot for your patience! :D

seanshanker commented 2 months ago

looks good. Thank you!, gventuri...

have a question on bamboollm...how can i get the results to be consistent? set temperature=0?

gventuri commented 2 months ago

@seanshanker unfortunately even with temperature 0 it's not granted that results will be consistent. One option would be setting a seed (which we currently don't support), but it would also be risky cause if it's wrong the first time, it will then be wrong all the times.

As an alternative, to make it more consistent you can try to use thetrain method (https://docs.pandas-ai.com/en/latest/train/#train-with-your-own-settings).

seanshanker commented 2 months ago

got it, Thank you!. when you train, does it only get more deterministic on the exact q/a or it also learns from the few q/a and get better?

seanshanker commented 2 months ago

one thing for long generative text to be so-so but danger with data being non deterministic, no?