Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.67k stars 1.07k forks source link

Pandasai ordering custom order not correct #1222

Open ssling0817 opened 2 weeks ago

ssling0817 commented 2 weeks ago

System Info

pandasai version 2.0.42

🐛 Describe the bug

The table I used have priority column, from high to low: "P0 - Critical, P1 - High, P2 - Medium, P3 - Low". I asked the agent to order from highest to lowest, but it always generated codes with .sort_values('priority', ascending=False) which actually should be .sort_values('priority'). Sometimes it even generated code with nlargest so will have error TypeError: Column 'priority' has dtype object, cannot use method 'nlargest' with this dtype. I have tried putting this order in input or putting in field_descriptions, but both don't work. How to solve this issue?

dosubot[bot] commented 2 weeks ago

To sort a custom priority column in a DataFrame with values "P0 - Critical", "P1 - High", "P2 - Medium", and "P3 - Low" from highest to lowest priority using pandasai version 2.0.42, you can use the following code:

import pandas as pd

# Sample DataFrame
data = {
    'Task': ['Task1', 'Task2', 'Task3', 'Task4'],
    'Priority': ['P2 - Medium', 'P0 - Critical', 'P3 - Low', 'P1 - High']
}
df = pd.DataFrame(data)

# Define the custom priority order
priority_order = ['P0 - Critical', 'P1 - High', 'P2 - Medium', 'P3 - Low']

# Convert the Priority column to a categorical type with the specified order
df['Priority'] = pd.Categorical(df['Priority'], categories=priority_order, ordered=True)

# Sort the DataFrame by the Priority column
df_sorted = df.sort_values('Priority')

print(df_sorted)

This code will sort the DataFrame df by the Priority column according to the custom order defined in priority_order. The sorted DataFrame df_sorted will have the tasks ordered from highest to lowest priority [1].

To continue talking to Dosu, mention @dosu.

ssling0817 commented 2 weeks ago

@dosu I am trying to use pandasai Agent, not interacting with df directly.

response = agent.chat("Return 5 tasks which have higher priority")
dosubot[bot] commented 2 weeks ago

To ensure that the pandasai Agent respects your custom priority order when processing queries, you can extend the agent with a custom pipeline that includes your sorting logic. Here's how you can do it:

  1. Create a Custom Pipeline: Define a custom pipeline that includes your sorting logic.
  2. Integrate the Custom Pipeline with the Agent: Use this custom pipeline with the pandasai Agent.

Here's an example:

from pandasai.agent.base import BaseAgent
from pandasai.pipelines.chat.generate_chat_pipeline import GenerateChatPipeline
import pandas as pd

class CustomSortPipeline(GenerateChatPipeline):
    def __init__(self, context, logger, **callbacks):
        super().__init__(context, logger, **callbacks)

    def custom_sort(self, df: pd.DataFrame, priority_order: list):
        # Implement your custom sorting logic here
        df['priority'] = pd.Categorical(df['Priority'], categories=priority_order, ordered=True)
        df = df.sort_values('priority')
        df = df.drop(columns=['priority'])
        return df

# Sample DataFrame
data = {
    'Task': ['Task1', 'Task2', 'Task3', 'Task4'],
    'Priority': ['P2 - Medium', 'P0 - Critical', 'P3 - Low', 'P1 - High']
}
df = pd.DataFrame(data)

# Define the custom priority order
priority_order = ['P0 - Critical', 'P1 - High', 'P2 - Medium', 'P3 - Low']

# Create an instance of the custom pipeline
custom_pipeline = CustomSortPipeline(context=None, logger=None)

# Create an instance of the agent and assign the custom pipeline
agent = BaseAgent(dfs=df)
agent.pipeline = custom_pipeline

# Use the custom sort method
sorted_df = custom_pipeline.custom_sort(df, priority_order)
print(sorted_df)

In this example, the CustomSortPipeline class is a subclass of GenerateChatPipeline where the custom_sort method implements the custom sorting logic. The BaseAgent is then configured to use this custom pipeline, ensuring that the DataFrame is sorted according to your custom priority order [1][2].

To continue talking to Dosu, mention @dosu.