Sinaptik-AI / pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
https://pandas-ai.com
Other
13.26k stars 1.29k forks source link

Questions about the train function #1343

Open mrgreen3325 opened 2 months ago

mrgreen3325 commented 2 months ago

Thanks for the great work. I have several questions about the instruct train function

  1. May I know that what vectorDB perform during the train? Does it act as a RAG?
  2. After the train, is that anyway to save the trained model or stuff? Or it requires to call the train function for the prompt everytime?
  3. For the cache, it seems generate a new one when I restart the kernel. Is that it store the previous prompt and response? Thank you very much.
dosubot[bot] commented 2 months ago

Hello @mrgreen3325! I'm here to help you with any questions you have about the train function. I'm a bot designed to assist with bug fixes, answering questions, and facilitating contributions. Let me know how I can assist you!

  1. The instruct_train function in PandasAI uses the vectorDB to store the training data. This allows the model to persist the training data and use it for generating responses to similar queries in the future. It acts as a Retrieval-Augmented Generation (RAG) system by storing and retrieving relevant information to enhance the model's responses [1].

  2. To save the trained model, you simply need to train it once, as the training data is persisted automatically. Here is an example of how to train the model:

    from pandasai import Agent
    
    # Set your PandasAI API key
    os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY"
    
    agent = Agent("data.csv")
    agent.train(docs=["The fiscal year starts in April"])
    
    response = agent.chat("What is the total sales for the fiscal year?")
    print(response)
    # The model will use the information provided in the training to generate a response

    The cache handles prompts and responses by storing the training data in the vector store, which can be either the default BambooVectorStore or a local vector store like ChromaDB, Qdrant, or Pinecone [1].

  3. The cache stores the previous prompts and responses by persisting the training data in the vector store. If you restart the kernel, the cache will be regenerated, but the training data stored in the vector store will still be available for generating responses to similar queries [1].

To continue talking to Dosu, mention @dosu.

samosun commented 4 days ago

@dosu can you show the specific code where the rag acts?