Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.17k stars 1.28k forks source link

how to train using GooglePalm - MissingVectorStoreError: No vector store provided. Please provide a vector store to train the agent. #1356

Open anilmadishetty2498 opened 2 months ago

anilmadishetty2498 commented 2 months ago

System Info

python==3.11.7

🐛 Describe the bug

from pandasai.llm import GooglePalm llm = GooglePalm(api_key=GOOGLE_API_KEY)

db = Agent([scm_vc, scm_rev], config={"llm": llm})

Example #2: train the model with Q/A

query = "identify top 5 customers by revenue for Indonesia" code = """ import pandas as pd top_5 = scm_rev[scm_rev['Country']=='Indonesia'].groupby('Customers').agg({'Revenue':'sum'}).sort_values(by='Revenue', ascending=False).head(5)

declare results

result = { "type" : "dataframe", "value" : top_5 } """

db.train(queries=[query], codes=[code])

MissingVectorStoreError: No vector store provided. Please provide a vector store to train the agent.

dosubot[bot] commented 2 months ago

To resolve the MissingVectorStoreError: No vector store provided error when training using GooglePalm, you need to ensure that you have set up a vector store. Here is how you can do it:

  1. Set the PandasAI API key:

    import os
    
    os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY"
  2. Instantiate the vector store and pass it to the Agent:

    from pandasai import Agent
    from pandasai.ee.vectorstores import ChromaDB
    
    # Instantiate the vector store
    vector_store = ChromaDB()
    
    # Instantiate the agent with the custom vector store
    db = Agent([scm_vc, scm_rev], config={"llm": llm}, vectorstore=vector_store)
  3. Train the model:

    query = "identify top 5 customers by revenue for Indonesia"
    code = """
    import pandas as pd
    top_5 = scm_rev[scm_rev['Country']=='Indonesia'].groupby('Customers').agg({'Revenue':'sum'}).sort_values(by='Revenue', ascending=False).head(5)
    
    #declare results
    result = {
       "type" : "dataframe",
       "value" : top_5
    }
    """
    
    db.train(queries=[query], codes=[code])

By following these steps, you should be able to resolve the MissingVectorStoreError and successfully train your model [1].

To continue talking to Dosu, mention @dosu.

anilmadishetty2498 commented 2 months ago

db.train(docs="if asked about variable cost return information from scm_vc table")

ValueError: Number of documents 1 must match number of ids 65

dosubot[bot] commented 2 months ago

To resolve the ValueError: Number of documents 1 must match number of ids 65 when using db.train with the docs parameter, you need to ensure that the number of documents matches the number of IDs provided. The error indicates that you are passing a single document but 65 IDs, which is a mismatch.

Here's how you can fix it:

  1. Ensure that the number of documents (docs) matches the number of IDs (ids).
  2. If you are not providing IDs, the function will generate them for you, but the number of generated IDs will match the number of documents.

Here is an example:

docs = ["if asked about variable cost return information from scm_vc table"]
ids = ["id1"]  # Ensure the number of IDs matches the number of documents

# This will work because the number of docs matches the number of ids
db.train(docs=docs, ids=ids)

If you only have one document but multiple IDs, you need to adjust your input to match the expected format [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] commented 2 months ago

To resolve the ValueError: Number of documents 1 must match number of ids 65 when using db.train with the docs parameter, you need to ensure that the number of documents matches the number of IDs provided. The error indicates that you are passing a single document but 65 IDs, which is a mismatch.

Here's how you can fix it:

  1. Ensure that the number of documents (docs) matches the number of IDs (ids).
  2. If you are not providing IDs, the function will generate them for you, but the number of generated IDs will match the number of documents.

Here is an example:

docs = ["if asked about variable cost return information from scm_vc table"]
ids = ["id1"]  # Ensure the number of ids matches the number of docs

# This will work because the number of docs matches the number of ids
db.add_docs(docs=docs, ids=ids)

If you only have one document but multiple IDs, you need to adjust your input to match the expected format [1][2].

To continue talking to Dosu, mention @dosu.

anilmadishetty2498 commented 2 months ago

AttributeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 db.add_docs(docs=docs, ids=ids)

AttributeError: 'Agent' object has no attribute 'add_docs'

dosubot[bot] commented 2 months ago

To resolve the AttributeError: 'Agent' object has no attribute 'add_docs', ensure that you are calling the add_docs method on an instance of the LanceDB class, not on an Agent object. The add_docs method is defined within the LanceDB class.

Here is an example of how you should instantiate the LanceDB class and call the add_docs method:

from pandasai.ee.vectorstores import LanceDB

# Create an instance of LanceDB
db = LanceDB(
    table_name="your_table_name",
    embedding_function=your_embedding_function,  # Optional, if you have a custom embedding function
    persist_path="/path/to/persist",
    max_samples=10,
    similary_threshold=1.5,
    logger=your_logger  # Optional, if you have a custom logger
)

# Prepare your documents and ids
docs = ["Document 1", "Document 2", "Document 3"]
ids = ["id1", "id2", "id3"]

# Add documents to the database
db.add_docs(docs=docs, ids=ids)

If you are working with the Agent class, you should ensure that the vector store associated with the Agent instance has the appropriate methods. Here is an example of how the Agent class interacts with the vector store:

from pandasai import Agent
from unittest.mock import MagicMock

# Create an instance of Agent
agent = Agent(sample_df, config, vectorstore=MagicMock())

# Prepare your documents
docs = ["Document 1", "Document 2", "Document 3"]

# Add documents to the vector store through the agent
agent._vectorstore.add_docs(docs)

Make sure that the vectorstore object associated with the Agent instance has the add_docs method. This should help you avoid the AttributeError [1][2][3].

To continue talking to Dosu, mention @dosu.