Implement RAG for CSP LLMs or Local LLaMa LLMs

obriensystems commented 3 months ago

obriensystems commented 2 months ago

follow https://python.langchain.com/v0.2/docs/tutorials/rag/

pip install langchain langchain_community langchain_chroma

[notice] A new release of pip is available: 23.0.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip

sign up to langsmith https://smith.langchain.com/o/71626d73-7ad4-57ab-8c7d-82cc7d004740/

pip install -qU langchain-google-vertexai
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-proto 1.26.0 requires protobuf<5.0,>=3.19, but you have protobuf 5.28.0 which is incompatible.

[notice] A new release of pip is available: 23.0.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip

michael@13900d MINGW64 /c/wse_github/obrienlabsdev/rag/src/rag (main)
$ pip install protobuf==4.25.4
Collecting protobuf==4.25.4
  Using cached protobuf-4.25.4-cp310-abi3-win_amd64.whl (413 kB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 5.28.0
    Uninstalling protobuf-5.28.0:
      Successfully uninstalled protobuf-5.28.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.66.1 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.4 which is incompatible.

switch to openapi


michael@13900d MINGW64 /c/wse_github/obrienlabsdev/rag/src/rag (main)
$ pip install -qU langchain-openai

get token https://platform.openai.com/api-keys

langchain website missing bs4 import

pip3 install BeautifulSoup4

add billing to openai

python app.py
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

$10 credit

michael@13900d MINGW64 /c/wse_github/obrienlabsdev/rag/src/rag (main)
$ python app.py
USER_AGENT environment variable not set, consider setting it to identify your requests.
C:\opt\Python310\lib\site-packages\langsmith\client.py:5431: LangChainBetaWarning: The function `loads` is in beta. It is actively being worked on, so the API may change.
  prompt = loads(json.dumps(prompt_object.manifest))

obriensystems commented 2 months ago

hitting rate limit https://platform.openai.com/settings/organization/limits

pt-4o-mini | 60,000 TPM | 3 RPM200 RPD | 200,000 TPD -- | -- | -- | --


openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

`
wait 20 sec
![image](https://github.com/user-attachments/assets/a4cb041a-eedf-46ea-90ca-31db6deefcf7)

obriensystems commented 2 months ago

working with RAG prompt to OpenAI


# from https://python.langchain.com/v0.2/docs/tutorials/rag/
import getpass
import os
import bs4

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

os.environ["OPENAI_API_KEY"] = "sk-..A"
#getpass.getpass()

llm = ChatOpenAI(model="gpt-4o-mini")

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "lsv...9"
#getpass.getpass()

# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

print(f"docs content size: %s" % (len(docs[0].page_content)))
print(docs[0].page_content[:500])

text_splitter = RecursiveCharacterTextSplitter(
   chunk_size=1000, chunk_overlap=200) #  add_start_index=True
splits = text_splitter.split_documents(docs)
print(f"splits: %s" % (len(splits)))
print(f"page_content: %s" % (len(splits[0].page_content)))
print(f"metadata: %s" % (len(splits[10].metadata)))

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

  # Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever() # search_type="similarity", search_kwargs={"k": 6}

print ("test")
#def skip():

prompt = hub.pull("rlm/rag-prompt")
rag_chain = (
  {"context": retriever | format_docs, "question": RunnablePassthrough()}
   | prompt
   | llm
   | StrOutputParser()
  )

retrieved = rag_chain.invoke("What is Task Decomposition?")
print(f"retrieved: %s" % (len(retrieved)))
print(f"retrieved content: %s" % (retrieved))#[0].page_content))

michael@13900d MINGW64 /c/wse_github/obrienlabsdev/rag/src/rag (main)
$ python app.py
USER_AGENT environment variable not set, consider setting it to identify your requests.
docs content size: 43131

      LLM Powered Autonomous Agents

Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In
splits: 66
page_content: 969
metadata: 1
test
C:\opt\Python310\lib\site-packages\langsmith\client.py:5431: LangChainBetaWarning: The function `loads` is in beta. It is actively being worked on, so the API may change.
  prompt = loads(json.dumps(prompt_object.manifest))
retrieved: 343
retrieved content: Task Decomposition is the process of breaking down complex tasks into smaller, manageable steps. This can be achieved through various methods, including prompting models to think step by step or providing task-specific instructions. It enhances performance by allowing easier handling of complicated tasks and clarifying the reasoning process.

obriensystems commented 2 months ago

michaelobrien@mbp7 rag % python app.py
USER_AGENT environment variable not set, consider setting it to identify your requests.
docs content size: 43131

      LLM Powered Autonomous Agents

Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In
splits: 66
page_content: 969
metadata: 1
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
retriever: tags=['Chroma', 'OpenAIEmbeddings'] vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x1226373d0> search_kwargs={'k': 6}
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
vectorstore retrieved: 6
vectorstore retrieved content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
/Users/michaelobrien/Library/Python/3.9/lib/python/site-packages/langsmith/client.py:5431: LangChainBetaWarning: The function `loads` is in beta. It is actively being worked on, so the API may change.
  prompt = loads(json.dumps(prompt_object.manifest))
example_messages: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
rag_chain retrieved: 396
rag_chain retrieved content: Task decomposition is the process of breaking down a complex task into smaller, more manageable steps. This can be achieved through techniques like Chain of Thought (CoT) prompting, which encourages step-by-step reasoning, or by using task-specific instructions and human inputs. The goal is to simplify the execution of difficult tasks and enhance understanding of the model's reasoning process.

obriensystems commented 2 months ago

refactor rag_chain

# from https://python.langchain.com/v0.2/docs/tutorials/rag/
import getpass
import os
import bs4

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import logging

logging.basicConfig(level=logging.INFO)
# Configure logging
logging.basicConfig(level=logging.INFO)

def set_environment_variables():
#    os.environ["OPENAI_API_KEY"] = ""
#    os.environ["LANGCHAIN_API_KEY"] = ""

def initialize_llm():
    return ChatOpenAI(model="gpt-4o-mini")

def load_documents():
    loader = WebBaseLoader(
        web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer(
                class_=("post-content", "post-title", "post-header")
            )
        ),
    )
    return loader.load()

def split_documents(docs):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, chunk_overlap=200
    )
    return text_splitter.split_documents(docs)

def create_vectorstore(splits):
    return Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

def retrieve_documents(vectorstore, query, k=6):
    retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": k})
    return retriever.invoke(query)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## refactor out rag_chain setup to allow for a non-local retriever from the vectorstore
def setup_rag_chain(retriever, prompt, llm):
    """
    Sets up the RAG (Retrieval-Augmented Generation) chain.

    Parameters:
    - retriever: The retriever object for document retrieval.
    - prompt: The prompt object for generating responses.
    - llm: The language model object.

    Returns:
    - The configured RAG chain.
    """
    return (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

def main():
    set_environment_variables()
    llm = initialize_llm()

    try:
        docs = load_documents()
        logging.info(f"Docs content size: {len(docs[0].page_content)}")
        logging.info(docs[0].page_content[:500])

        splits = split_documents(docs)
        logging.info(f"Splits: {len(splits)}")
        logging.info(f"Page content size: {len(splits[0].page_content)}")
        logging.info(f"Metadata size: {len(splits[10].metadata)}")

        vectorstore = create_vectorstore(splits)
        # retrieved_docs are not used
        retrieved_docs = retrieve_documents(vectorstore, "What are the approaches to Task Decomposition?")
        logging.info(f"Vectorstore retrieved: {len(retrieved_docs)}")

        prompt = hub.pull("rlm/rag-prompt")
        example_messages = prompt.invoke(
            {"context": "filler context", "question": "filler question"}
        ).to_messages()
        logging.info(f"Example messages: {example_messages[0].content}")

        # unroll retriever
        retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
        rag_chain = setup_rag_chain(retriever, prompt, llm)

        # Uncomment to use the RAG chain
        # for chunk in rag_chain.stream("What is Task Decomposition?"):
        #     logging.info(chunk)
        retrieved = rag_chain.invoke("What is Task Decomposition?")
        logging.info(f"rag_chain retrieved: {len(retrieved)}")
        logging.info(f"rag_chain retrieved content: {retrieved}")

    except Exception as e:
        logging.error(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

output

michael@13900d MINGW64 /c/wse_github/obrienlabsdev/rag/src/rag (main)
$ python app.py
USER_AGENT environment variable not set, consider setting it to identify your requests.
INFO:root:Docs content size: 43131
INFO:root:

      LLM Powered Autonomous Agents

Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In
INFO:root:Splits: 66
INFO:root:Page content size: 969
INFO:root:Metadata size: 1
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:root:Vectorstore retrieved: 6
C:\opt\Python310\lib\site-packages\langsmith\client.py:5431: LangChainBetaWarning: The function `loads` is in beta. It is actively being worked on, so the API may change.
  prompt = loads(json.dumps(prompt_object.manifest))
INFO:root:Example messages: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question
Context: filler context
Answer:
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:root:rag_chain retrieved: 349
INFO:root:rag_chain retrieved content: Task Decomposition is a technique used in planning complex tasks by breaking them down into smaller, more manageable steps. This process enhances model performance by allowing the agent to think step-by-step and utilize more computational resources effectively. It can be achieved through simple prompts, task-specific instructions, or human inputs.

ObrienlabsDev / rag

Implement RAG for CSP LLMs or Local LLaMa LLMs #1