Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.69k stars 1.08k forks source link

Pandasai Multiple Dataframe: sqlalchemy.exc.DatabaseError: (databricks.sql.exc.ServerOperationError) [UNRESOLVED_COLUMN.WITH_SUGGESTION] #1179

Open ssling0817 opened 1 month ago

ssling0817 commented 1 month ago

System Info

Pandasai version: 2.0.42 Python version: 3.10.12

🐛 Describe the bug

I tried to pass in 2 tables into pandasai Agent and join a column which has different column name in 2 tables.

agent = Agent([table_userinfo, table_ticketinfo], config={"llm": llm})
answer = agent.chat(prompt)

I want to get the user info of a specific ticket. It can produce the correct code until step 5. [INFO] Code generated:

result_df = dfs[1].merge(dfs[0], left_on='reporter', right_on='name')
result_df = result_df[result_df['issue_id'] == 1][['LegalLastName', 'LegalFirstName', 'Email']]
result = {'type': 'dataframe', 'value': result_df}

However, it cannot execute the code successfully on step 6. [INFO] Executing Step 6: CodeExecution [ERROR] Failed with error: Traceback (most recent call last): databricks.sql.exc.ServerOperationError: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name issue_id cannot be resolved. Did you mean one of the following? [Email, ...].; line 3 pos 6 [SQL: SELECT * FROM table_userinfo WHERE issue_id = %(value_0)s] [parameters: {'value_0': 1}] (Background on this error at: https://sqlalche.me/e/14/4xp6)

I don't understand why the code is correct about getting issue_id from the merged table, but then the SQL code try to get issue_id from the userinfo table. What might be the suitable input prompt for multiple table scenarios, or how to solve the issue?

dosubot[bot] commented 1 month ago

To address the issue of unresolved columns when joining tables with different column names in PandasAI, you can follow these steps:

  1. Ensure Column Names are Correct: Make sure that the column names used in the join operation are correct and exist in the respective tables.

  2. Use Aliases for Columns: If the column names are different, you can use aliases to rename the columns before performing the join.

  3. Validate Column Names: Use the _validate_column_name method to ensure that the column names conform to the expected pattern.

Here is an example of how you can modify your code to handle different column names:

import pandas as pd
from pandasai import Agent

# Example DataFrames
table_userinfo = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'LegalLastName': ['Smith', 'Jones', 'Brown'],
    'LegalFirstName': ['Alice', 'Bob', 'Charlie'],
    'Email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']
})

table_ticketinfo = pd.DataFrame({
    'reporter': ['Alice', 'Bob', 'Charlie'],
    'issue_id': [1, 2, 3]
})

# Ensure column names are valid
def _validate_column_name(column_name):
    import re
    regex = r"^[a-zA-Z0-9_]+$"
    if not re.match(regex, column_name):
        raise ValueError(f"Invalid column name: {column_name}")

_validate_column_name('reporter')
_validate_column_name('name')

# Initialize the Agent
agent = Agent([table_userinfo, table_ticketinfo], config={"llm": "your_llm_here"})

# Define the prompt
prompt = "Get the user info of the ticket with issue_id 1."

# Execute the chat
answer = agent.chat(prompt)
print(answer)

In this example, the _validate_column_name method ensures that the column names are valid before performing the join operation. This should help resolve the UNRESOLVED_COLUMN.WITH_SUGGESTION error.

Additionally, make sure that the column names used in the generated code are correct and exist in the respective DataFrames. If the column names are different, you can rename them before performing the join operation.

For more details on handling SQL execution and common errors related to unresolved columns, you can refer to the relevant code snippets and explanations in the PandasAI repository [1][2].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.