Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
11.67k stars 1.07k forks source link

Last code generated never used for prompt generation #1235

Open sujeendran opened 1 week ago

sujeendran commented 1 week ago

System Info

OS version: Windows 10 Python version: 3.10 PandasAI version: 2.0.40

🐛 Describe the bug

https://github.com/Sinaptik-AI/pandas-ai/blob/64e6dcd8cbded4228eb1fc9382df71e80ccd9c1e/pandasai/pipelines/chat/prompt_generation.py#L64

While using this library I noticed this last_code_generated is always None while generating the prompt because of this: https://github.com/Sinaptik-AI/pandas-ai/blob/64e6dcd8cbded4228eb1fc9382df71e80ccd9c1e/pandasai/pipelines/chat/generate_chat_pipeline.py#L288

Is there any case where it should actually be populated? The reason why I am asking this is because during an ongoing chat session if I want the llm to make a small correction to the existing result, it is not able to make that without redoing the code from start. I feel having this and a check to see if the last conversation is relevant will give better results. (I just realized that was agent function was removed in the last release)

Is there a way to pass the agent.last_code_generated to the prompt generation somehow?

dosubot[bot] commented 1 week ago

Hello, @sujeendran! I'm here to assist you with any bugs, questions, or contributions you have for the pandas-ai project. Let me know how I can help you!

Yes, there is a way to pass the agent.last_code_generated to the prompt generation in the PandasAI library. The PipelineContext object can be used to store and retrieve the last_code_generated variable. You can modify the PromptGeneration step to include this variable in the prompt.

Here is an example of how you can modify the PromptGeneration step to include last_code_generated:

from typing import Any, Union

from pandasai.pipelines.logic_unit_output import LogicUnitOutput
from ...helpers.logger import Logger
from ...prompts.base import BasePrompt
from ...prompts.generate_python_code import GeneratePythonCodePrompt
from ...prompts.generate_python_code_with_sql import GeneratePythonCodeWithSQLPrompt
from ..base_logic_unit import BaseLogicUnit
from ..pipeline_context import PipelineContext

class PromptGeneration(BaseLogicUnit):
    """
    Code Prompt Generation Stage
    """

    def execute(self, input: Any, **kwargs) -> Any:
        """
        This method will return output according to
        Implementation.

        :param input: Your input data.
        :param kwargs: A dictionary of keyword arguments.
            - 'logger' (any): The logger for logging.
            - 'config' (Config): Global configurations for the test
            - 'context' (any): The execution context.

        :return: The result of the execution.
        """
        self.context: PipelineContext = kwargs.get("context")
        self.logger: Logger = kwargs.get("logger")

        prompt = self.get_chat_prompt(self.context)
        self.logger.log(f"Using prompt: {prompt}")

        return LogicUnitOutput(
            prompt,
            True,
            "Prompt Generated Successfully",
            {"content_type": "prompt", "value": prompt.to_string()},
        )

    def get_chat_prompt(self, context: PipelineContext) -> Union[str, BasePrompt]:
        # set matplotlib as the default library
        viz_lib = "matplotlib"
        if context.config.data_viz_library:
            viz_lib = context.config.data_viz_library

        output_type = context.get("output_type")

        return (
            GeneratePythonCodeWithSQLPrompt(
                context=context,
                last_code_generated=context.get("last_code_generated"),
                viz_lib=viz_lib,
                output_type=output_type,
            )
            if context.config.direct_sql
            else GeneratePythonCodePrompt(
                context=context,
                last_code_generated=context.get("last_code_generated"),
                viz_lib=viz_lib,
                output_type=output_type,
            )
        )

In the GenerateChatPipeline class, ensure that the last_code_generated is stored in the context after code generation:

def run_generate_code(self, input: ChatPipelineInput) -> dict:
    self._logger.log(f"Executing Pipeline: {self.__class__.__name__}")

    # Reset intermediate values
    self.context.reset_intermediate_values()

    # Start New Tracking for Query
    self.query_exec_tracker.start_new_track(input)

    self.query_exec_tracker.add_skills(self.context)

    self.query_exec_tracker.add_dataframes(self.context.dfs)

    # Add Query to memory
    self.context.memory.add(input.query, True)

    self.context.add_many(
        {
            "output_type": input.output_type,
            "last_prompt_id": input.prompt_id,
        }
    )
    try:
        output = self.code_generation_pipeline.run(input)

        # Store the last code generated in the context
        self.context.add("last_code_generated", output.get("value"))

        self.query_exec_tracker.success = True

        self.query_exec_tracker.publish()

        return output

    except Exception as e:
        import traceback
        traceback.print_exc()

        self.last_error = str(e)
        self.query_exec_tracker.success = False
        self.query_exec_tracker.publish()

        return (
            "Unfortunately, I was not able to answer your question, "
            "because of the following error:\n"
            f"\n{e}\n"
        )

Additionally, you can use the Callbacks class to handle the last_code_generated variable. This class provides methods that are called at different stages of the code generation process, ensuring that the last_code_generated is always up-to-date:

from ..prompts import BasePrompt

class Callbacks:
    def __init__(self, agent):
        self.agent = agent

    def on_prompt_generation(self, prompt: BasePrompt) -> str:
        """
        A method to be called after prompt generation.

        Args:
            prompt (str): A prompt
        """
        self.agent.last_prompt = str(prompt)

    def on_code_generation(self, code: str):
        """
        A method to be called after code generation.

        Args:
            code (str): A python code
        """
        self.agent.last_code_generated = code

    def before_code_execution(self, code: str):
        """
        A method to be called after code execution.

        Args:
            code (str): A python code
        """
        self.agent.last_code_executed = code

    def on_result(self, result):
        """
        A method to be called after code execution.

        Args:
            result (Any): A python code
        """
        self.agent.last_result = result

By making these modifications, the last_code_generated variable will be included in the prompt generation, allowing for small corrections to the existing result without redoing the code from the start. Additionally, the BaseAgent class in the PandasAI library already includes attributes like last_code_generated and methods to handle the context and callbacks, which can be leveraged to ensure the last_code_generated is properly managed [1][2][3][4].

To continue talking to Dosu, mention @dosu.