Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.67k stars 1.07k forks source link

Unnecessary datatype mismatch error | Pandas AI #1231

Open usmanbinimran-octopus opened 1 week ago

usmanbinimran-octopus commented 1 week ago

System Info

OS version: Windows 11 pro Python version: 3.9 The current version of pandasai being used: 2.2

πŸ› Describe the bug

Hi @gventuri, hope this message finds you well.

I am here with an unexpected behaviour that I am experiencing for so long. Please look into it:

PROMPT show last 10 records

RESPONSE Unfortunately, I was not able to answer your question, because of the following error: Value type <class 'list'> must match with type dataframe

EXPLANATION I looked at the previous conversation and saw that you wanted to see the last 10 records. So, I wrote some code that takes the last 10 records from each dataset and puts them into a new dataframe. This way, you can easily see the most recent data without having to look through everything.

CLARIFICATION QUESTIONS ['Which dataframe should the last 10 records be shown from?', 'Are the records supposed to be sorted in any particular order before showing the last 10?', 'Is there a specific column that should be used for sorting the records before displaying the last 10?']

REVISED PROMPT: Display the most recent 10 entries.

EXECUTED CODE: dfs[0] = dfs[0].tail(10) dfs[1] = dfs[1].tail(10) result = {'type': 'dataframe', 'value': dfs}

OPEN AI TOKEN CB: Tokens Used: 3350 Prompt Tokens: 2994 Completion Tokens: 356 Total Cost (USD): $ 0.002031

ERROR ON COMMAND PROMPT: Traceback (most recent call last): File "W:\OctopusTasks\GenInsight\GenInsight-Phase2\GenInsight-Phase2\env-geninsight-phase2\lib\site-packages\pandasai\pipelines\chat\generate_chat_pipeline.py", line 308, in run output = ( File "W:\OctopusTasks\GenInsight\GenInsight-Phase2\GenInsight-Phase2\env-geninsight-phase2\lib\site-packages\pandasai\pipelines\pipeline.py", line 137, in run raise e File "W:\OctopusTasks\GenInsight\GenInsight-Phase2\GenInsight-Phase2\env-geninsight-phase2\lib\site-packages\pandasai\pipelines\pipeline.py", line 101, in run step_output = logic.execute( File "W:\OctopusTasks\GenInsight\GenInsight-Phase2\GenInsight-Phase2\env-geninsight-phase2\lib\site-packages\pandasai\pipelines\chat\code_execution.py", line 113, in execute raise e File "W:\OctopusTasks\GenInsight\GenInsight-Phase2\GenInsight-Phase2\env-geninsight-phase2\lib\site-packages\pandasai\pipelines\chat\code_execution.py", line 97, in execute raise InvalidOutputValueMismatch( pandasai.exceptions.InvalidOutputValueMismatch: Value type <class 'list'> must match with type dataframe Exception in APILogger: <!DOCTYPE html>

Error
Cannot POST /api/log/add

@gventuri, To the best of my domain knowledge, everything except response is working fine! I think there's a minor case handling that needs to be handled. Because reading at Explanation & Code Executed, everything seems smooth.

Looking forward to hearing from you!

Kind regards, Usman Bin Imran +92-310-480-160-1

image

dosubot[bot] commented 1 week ago

Hey @usmanbinimran-octopus! πŸ‘‹ I'm here to help you with any bugs, questions, or contributions you have in mind. Let's tackle this together!

To resolve the datatype mismatch error when using Pandas AI to display the last 10 records from multiple dataframes, you need to ensure that the data types of the columns in the dataframes are consistent before concatenating or performing operations on them. Here is an example of how you can achieve this:

import pandas as pd
from pandasai import PandasAI

# Example dataframes
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c']
})

df2 = pd.DataFrame({
    'A': [4, 5, 6],
    'B': ['d', 'e', 'f']
})

# Ensure the data types are consistent
df1 = df1.astype({'A': 'int', 'B': 'str'})
df2 = df2.astype({'A': 'int', 'B': 'str'})

# Concatenate dataframes
df_combined = pd.concat([df1, df2])

# Initialize PandasAI
pandas_ai = PandasAI(dfs=[df_combined])

# Display the last 10 records using PandasAI
print(pandas_ai.chat("Show the last 10 records"))

This code ensures that the columns 'A' and 'B' in both dataframes df1 and df2 have the same data types before concatenating them. Additionally, it uses the PandasAI library to handle the dataframes and display the last 10 records, which should help in resolving the datatype mismatch error [1][2].

To continue talking to Dosu, mention @dosu.

usmanbinimran-octopus commented 1 week ago

So it will not work for multiple datasets having different attributes?

Please confirm @gventuri

Best, Usman Bin Imran +92-310-480-160-1