Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.39k stars 1.3k forks source link

How to use PandasAI agent with local LLM #1370

Closed vsneelesh closed 1 month ago

vsneelesh commented 1 month ago

I'm currently working with PandasAI and I'm interested in utilizing agents for my projects. However, I'm facing some challenges in integrating a local LLM instead of relying on the default Bamboo LLM.

I would like to know how to effectively set up and use PandasAI agents with my local LLM. Specifically, I'm looking for guidance on:

The steps needed to integrate a local LLM with PandasAI. Any examples or code snippets that demonstrate how to create a custom LLM class for this purpose. Best practices for ensuring the local LLM interacts smoothly with the PandasAI agent.

Thanks in advance for your help!

dosubot[bot] commented 1 month ago

Hey @vsneelesh! I'm here to help you with any bugs, questions, or contributions you have regarding PandasAI. Let's tackle this challenge together!

To set up and use PandasAI agents with a local LLM, follow these steps:

Integration Steps:

  1. Install Dependencies: Ensure you have the necessary dependencies installed, such as openai and pandasai.
  2. Create a Custom LLM Class: Extend the LLM base class to create your custom LLM class.
  3. Implement Required Methods: Implement methods for chat completion and calling the LLM.

Creating a Custom LLM Class:

Here's an example of how to create a custom LocalLLM class:

from __future__ import annotations
from typing import TYPE_CHECKING
from openai import OpenAI
from pandasai.helpers.memory import Memory
from pandasai.prompts.base import BasePrompt
from pandasai.llm.base import LLM

if TYPE_CHECKING:
    from pandasai.pipelines.pipeline_context import PipelineContext

class LocalLLM(LLM):
    def __init__(self, api_base: str, model: str = "", api_key: str = "", **kwargs):
        if not api_key:
            api_key = "dummy"

        self.model = model
        self.client = OpenAI(base_url=api_base, api_key=api_key).chat.completions
        self._invocation_params = kwargs

    def chat_completion(self, value: str, memory: Memory) -> str:
        messages = memory.to_openai_messages() if memory else []

        # adding current prompt as latest query message
        messages.append(
            {
                "role": "user",
                "content": value,
            }
        )

        params = {"model": self.model, "messages": messages, **self._invocation_params}
        response = self.client.create(**params)

        return response.choices[0].message.content

    def call(self, instruction: BasePrompt, context: PipelineContext = None) -> str:
        self.last_prompt = instruction.to_string()

        memory = context.memory if context else None

        return self.chat_completion(self.last_prompt, memory)

    @property
    def type(self) -> str:
        return "local"

Using the Agent with DataFrames:

  1. Create DataFrames: Prepare your data in DataFrames.
  2. Initialize the Agent: Set up the Agent with the DataFrames and configure the memory size.
  3. Interact with the Agent: Use chat, clarification questions, and training methods to interact with the agent.

Example:

import os
import pandas as pd
from pandasai import Agent

employees_data = {
    "EmployeeID": [1, 2, 3, 4, 5],
    "Name": ["John", "Emma", "Liam", "Olivia", "William"],
    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
}

salaries_data = {
    "EmployeeID": [1, 2, 3, 4, 5],
    "Salary": [5000, 6000, 4500, 7000, 5500],
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)

# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "your-api-key"

agent = Agent([employees_df, salaries_df], memory_size=10)

# Chat with the agent
response = agent.chat("Who gets paid the most?")
print(response)

# Get Clarification Questions
questions = agent.clarification_questions("Who gets paid the most?")
for question in questions:
    print(question)

# Explain how the chat response is generated
response = agent.explain()
print(response)

# Train with data
queries = [
    "Display the distribution of ages in the population.",
    "Visualize the distribution of product ratings.",
    "Show the distribution of household incomes in a region.",
]

codes = [
    "display_age_distribution()",
    "visualize_product_ratings_distribution()",
    "show_household_incomes_distribution_in_region()",
]

agent.train(queries, codes)

print("Done")

Best Practices:

By following these steps and using the provided code, you can effectively set up and use PandasAI agents with a local LLM [1][2][3][4].

To continue talking to Dosu, mention @dosu.

vsneelesh commented 1 month ago

Hey, thanks for the reply. I would like to use local llms such as llama3.1 using ollama. I have it on a remote server. I would like to use this for agents rather than api call. I am using an controlled enviroment which wont give me access to apis

On Wed, 25 Sept, 2024, 12:41 dosubot[bot], @.***> wrote:

Hey @vsneelesh https://github.com/vsneelesh! I'm here to help you with any bugs, questions, or contributions you have regarding PandasAI. Let's tackle this challenge together!

To set up and use PandasAI agents with a local LLM, follow these steps: Integration Steps:

  1. Install Dependencies: Ensure you have the necessary dependencies installed, such as openai and pandasai.
  2. Create a Custom LLM Class: Extend the LLM base class to create your custom LLM class.
  3. Implement Required Methods: Implement methods for chat completion and calling the LLM.

Creating a Custom LLM Class:

Here's an example of how to create a custom LocalLLM class:

from future import annotationsfrom typing import TYPE_CHECKINGfrom openai import OpenAIfrom pandasai.helpers.memory import Memoryfrom pandasai.prompts.base import BasePromptfrom pandasai.llm.base import LLM if TYPE_CHECKING: from pandasai.pipelines.pipeline_context import PipelineContext class LocalLLM(LLM): def init(self, api_base: str, model: str = "", api_key: str = "", **kwargs): if not api_key: api_key = "dummy"

    self.model = model
    self.client = OpenAI(base_url=api_base, api_key=api_key).chat.completions
    self._invocation_params = kwargs

def chat_completion(self, value: str, memory: Memory) -> str:
    messages = memory.to_openai_messages() if memory else []

    # adding current prompt as latest query message
    messages.append(
        {
            "role": "user",
            "content": value,
        }
    )

    params = {"model": self.model, "messages": messages, **self._invocation_params}
    response = self.client.create(**params)

    return response.choices[0].message.content

def call(self, instruction: BasePrompt, context: PipelineContext = None) -> str:
    self.last_prompt = instruction.to_string()

    memory = context.memory if context else None

    return self.chat_completion(self.last_prompt, memory)

@property
def type(self) -> str:
    return "local"

Using the Agent with DataFrames:

  1. Create DataFrames: Prepare your data in DataFrames.
  2. Initialize the Agent: Set up the Agent with the DataFrames and configure the memory size.
  3. Interact with the Agent: Use chat, clarification questions, and training methods to interact with the agent.

Example:

import osimport pandas as pdfrom pandasai import Agent employees_data = { "EmployeeID": [1, 2, 3, 4, 5], "Name": ["John", "Emma", "Liam", "Olivia", "William"], "Department": ["HR", "Sales", "IT", "Marketing", "Finance"], } salaries_data = { "EmployeeID": [1, 2, 3, 4, 5], "Salary": [5000, 6000, 4500, 7000, 5500], } employees_df = pd.DataFrame(employees_data)salaries_df = pd.DataFrame(salaries_data)

By default, unless you choose a different LLM, it will use BambooLLM.# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)os.environ["PANDASAI_API_KEY"] = "your-api-key"

agent = Agent([employees_df, salaries_df], memory_size=10)

Chat with the agentresponse = agent.chat("Who gets paid the most?")print(response)

Get Clarification Questionsquestions = agent.clarification_questions("Who gets paid the most?")for question in questions:

print(question)

Explain how the chat response is generatedresponse = agent.explain()print(response)

Train with dataqueries = [

"Display the distribution of ages in the population.",
"Visualize the distribution of product ratings.",
"Show the distribution of household incomes in a region.",

] codes = [ "display_age_distribution()", "visualize_product_ratings_distribution()", "show_household_incomes_distribution_in_region()", ] agent.train(queries, codes) print("Done")

Best Practices:

  • Ensure correct configuration of api_base, model, and api_key.
  • Handle memory efficiently to maintain context across interactions.
  • Use appropriate prompts and context to get accurate responses from the LLM.

By following these steps and using the provided code, you can effectively set up and use PandasAI agents with a local LLM [1] https://github.com/Sinaptik-AI/pandas-ai/blob/main/pandasai/llm/local_llm.py [2] https://github.com/Sinaptik-AI/pandas-ai/blob/main/pandasai/llm/base.py [3] https://github.com/Sinaptik-AI/pandas-ai/blob/main/examples/agent.py [4] https://github.com/Sinaptik-AI/pandas-ai/blob/main/README.md.

To continue talking to Dosu https://dosu.dev, mention @dosu https://github.com/dosu.

— Reply to this email directly, view it on GitHub https://github.com/Sinaptik-AI/pandas-ai/issues/1370#issuecomment-2373236595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3AOG2KFOBXBRJ7PHYMYSTZYJO2PAVCNFSM6AAAAABOZ2YKZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZTGIZTMNJZGU . You are receiving this because you were mentioned.Message ID: @.***>

dosubot[bot] commented 1 month ago

To use local LLMs like Llama 3.1 with PandasAI agents in a controlled environment without API access, you can host the model on a local inference server that adheres to the OpenAI API. Here’s how you can set it up:

Integration Steps:

  1. Install Dependencies:

    pip install pandasai
  2. Create a Custom LLM Class: You can use the LocalLLM class provided by PandasAI to connect to your local inference server. Here’s an example:

    from pandasai import SmartDataframe
    from pandasai.llm.local_llm import LocalLLM
    
    # Assuming you have a local server running that hosts the Llama 3.1 model
    llm = LocalLLM(api_base="http://your-remote-server:1234/v1", model="llama-3.1")
    df = SmartDataframe("data.csv", config={"llm": llm})
    
    response = df.chat("Calculate the sum of the GDP of North American countries")
    print(response)
  3. Using the Agent with DataFrames: Prepare your data in DataFrames and initialize the agent with these DataFrames. Here’s an example:

    import pandas as pd
    from pandasai import Agent
    
    employees_data = {
       "EmployeeID": [1, 2, 3, 4, 5],
       "Name": ["John", "Emma", "Liam", "Olivia", "William"],
       "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
    }
    
    salaries_data = {
       "EmployeeID": [1, 2, 3, 4, 5],
       "Salary": [5000, 6000, 4500, 7000, 5500],
    }
    
    employees_df = pd.DataFrame(employees_data)
    salaries_df = pd.DataFrame(salaries_data)
    
    agent = Agent([employees_df, salaries_df], memory_size=10)
    
    # Chat with the agent
    response = agent.chat("Who gets paid the most?")
    print(response)

Best Practices:

By following these steps, you can effectively set up and use PandasAI agents with a local LLM like Llama 3.1 hosted on a remote server [1][2].

To continue talking to Dosu, mention @dosu.

vsneelesh commented 1 month ago

I have tried this already. But it is giving an error. For your info this is the code am trying import pandas as pd from pandasai.llm.local_llm import LocalLLM from pandasai import SmartDataframe from pandasai.llm import BambooLLM from pandasai import Agent

os.environ["PANDASAI_API_KEY"] = ""

model=LocalLLM(api_base="http://localhost:11434/v1",model="llama3.1") data=pd.read_csv("/home/neelesh/sample_file3_gas.csv") agent = Agent( data,memory_size=10 )

response = agent.chat("What is the data about") print(response) If I do not use agent then I get a proper reply from llama. But once I initialize an agent and run this is the output I am getting pandasai.exceptions.PandasAIApiKeyError: The api_key client option must be set either by passing api_key to the client or by setting the PANDASAI_API_KEY environment variable. To get the key follow below steps:

  1. Go to https://www.pandabi.ai and sign up
  2. From settings go to API keys and copy
  3. Set environment variable like os.environ['PANDASAI_API_KEY'] = '$2a$10$flb7....'
vsneelesh commented 1 month ago

Got It The code goes like import pandas as pd from pandasai.llm.local_llm import LocalLLM from pandasai import SmartDataframe from pandasai.llm import BambooLLM from pandasai import Agent

model=LocalLLM(api_base="http://localhost:11434/v1",model="llama3.1") data=pd.read_csv("/home/neelesh/sample_file3_gas.csv")

agent = Agent( data,config={"llm":model} )

response = agent.chat("What is the data about") print(response) For using agents with Local LLM