Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
12.51k stars 1.21k forks source link

Used skills are not found #1296

Open WojtAcht opened 1 month ago

WojtAcht commented 1 month ago

System Info

OS version: macOS 14.5 Python version: Python 3.10.7 The current version of pandasai being used: 2.2.12

🐛 Describe the bug

Bug: find_function_calls Fails to Detect Skills Used as Arguments in Higher-Order Functions

Description

The find_function_calls method in the CodeCleaning class is failing to detect skills that are used as arguments in higher-order functions, particularly when the skill produces a list or DataFrame as output.

Current Behavior

When a skill is passed as an argument to a method like pandas.DataFrame.apply(), the current implementation of find_function_calls does not recognize it as a used skill.

Expected Behavior

The method should detect and record all uses of skills, including when they are passed as arguments to higher-order functions.

Example

Consider a skill that produces a list as output:

@skill
def calculate_salary_percentiles(salaries: list[int]) -> list[float]:
    """
    Calculates the 25th, 50th and 75th percentiles of salaries.
    Args:
        salaries (list[int]): List of employee salaries
    Returns:
        list[float]: A list containing the 25th, 50th, and 75th percentiles
    """
    import numpy as np
    percentiles = np.percentile(salaries, [25, 50, 75])
    return percentiles.tolist()

When this skill is used in a DataFrame operation like:

df["salaries"].apply(calculate_salary_percentiles)

The current find_function_calls method fails to detect calculate_salary_percentiles as a used skill.

Current Implementation

def find_function_calls(self, node: ast.AST, context: CodeExecutionContext):
    if isinstance(node, ast.Call):
        if isinstance(node.func, ast.Name):
            if context.skills_manager.skill_exists(node.func.id):
                context.skills_manager.add_used_skill(node.func.id)
        elif isinstance(node.func, ast.Attribute) and isinstance(
            node.func.value, ast.Name
        ):
            context.skills_manager.add_used_skill(
                f"{node.func.value.id}.{node.func.attr}"
            )
    for child_node in ast.iter_child_nodes(node):
        self.find_function_calls(child_node, context)

Proposed Fix

To address this issue, we need to extend the method to check function arguments:

for arg in node.args:
   if isinstance(arg, ast.Name) and context.skills_manager.skill_exists(arg.id):
      context.skills_manager.add_used_skill(arg.id)
dosubot[bot] commented 1 month ago

To address the issue where the find_function_calls method fails to detect skills used as arguments in higher-order functions, you can modify the method to recursively check for function calls within the arguments of other function calls. Here is the updated implementation:

def find_function_calls(self, node: ast.AST, context: CodeExecutionContext):
    if isinstance(node, ast.Call):
        if isinstance(node.func, ast.Name):
            if context.skills_manager.skill_exists(node.func.id):
                context.skills_manager.add_used_skill(node.func.id)
        elif isinstance(node.func, ast.Attribute) and isinstance(node.func.value, ast.Name):
            context.skills_manager.add_used_skill(f"{node.func.value.id}.{node.func.attr}")

        # Check for skills used as arguments in higher-order functions
        for arg in node.args:
            if isinstance(arg, ast.Name) and context.skills_manager.skill_exists(arg.id):
                context.skills_manager.add_used_skill(arg.id)
            elif isinstance(arg, ast.Call):
                self.find_function_calls(arg, context)

    for child_node in ast.iter_child_nodes(node):
        self.find_function_calls(child_node, context)

This modification ensures that the method recursively checks for function calls within the arguments of other function calls, thereby detecting and recording skills used as arguments in higher-order functions [1].

To continue talking to Dosu, mention @dosu.