Open YunchaoYang opened 8 months ago
SentenceTransformers is a Python library that provides pre-trained models for generating dense vector representations of sentences or text. These dense vector representations, also known as embeddings, capture the semantic meaning of the input text. These embeddings can then be used for various natural language processing (NLP) tasks such as semantic similarity calculation, clustering, classification, and information retrieval.
SentenceTransformers is needed because it simplifies the process of generating high-quality sentence embeddings. Before the advent of SentenceTransformers and similar libraries, generating effective embeddings for sentences required significant expertise in NLP, deep learning, and access to large-scale compute resources for training models from scratch. SentenceTransformers provides pre-trained models that have been trained on large text corpora, which saves time and resources for practitioners and researchers. By using pre-trained models, users can obtain meaningful sentence representations without the need for extensive training data or computational resources.
Furthermore, SentenceTransformers offers a user-friendly interface that allows developers to easily integrate sentence embedding functionality into their NLP applications, making it accessible to a wider range of users. Overall, SentenceTransformers simplifies the process of leveraging advanced NLP techniques for tasks that require understanding the semantic similarity or meaning of sentences.
SentenceTransformers primarily uses transformer-based architectures, such as BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly optimized BERT approach), for generating embeddings of sentences or text. These transformer models are pre-trained on large corpora of text data using unsupervised learning techniques, such as masked language modeling and next sentence prediction.
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# Our sentences to encode
sentences = [
"This framework generates embeddings for each input sentence",
"Sentences are passed as a list of string.",
"The quick brown fox jumps over the lazy dog."
]
# Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)
# Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
print("Sentence:", sentence)
print("Embedding:", embedding)
print("")
It consists of four high-level components:
Because chatbot is built on messages, rather than raw text. So Chat Models serve the purpose of conversational tone and message interface better than text LLMs, but raw LLM can be chatbots as well.
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0.2)
# invoke the chatmodel with output as `AIMessage`
from langchain_core.messages import HumanMessage
chat.invoke(
[
HumanMessage(
content="Translate this sentence from English to French: I love programming."
)
]
)
# pass the entire conversation history into the model
from langchain_core.messages import AIMessage
chat.invoke(
[
HumanMessage(
content="Translate this sentence from English to French: I love programming."
),
AIMessage(content="J'adore la programmation."),
HumanMessage(content="What did you just say?"),
]
)
# The output is
`AIMessage(content='I said "J\'adore la programmation," which means "I love programming" in French.')
`
To automatic the prompt messaging better for LLM to understand, a sets of user-defined templates for messaging with key components as input variables while keeping the reusable boiler plate text keeps still in the template.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant. Answer all questions to the best of your ability.",
),
MessagesPlaceholder(variable_name="messages"),
]
)
chain = prompt | chat # MessagesPlaceholder piping the prompt to the previously defined chat model
# then we can invoke the chain again
chain.invoke(
{
"messages": [
HumanMessage(
content="Translate this sentence from English to French: I love programming."
),
AIMessage(content="J'adore la programmation."),
HumanMessage(content="What did you just say?"),
],
}
)
Allow chatbot to remember past interactions and take them into account when responding to the followup questions
from langchain.memory import ChatMessageHistory
demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("hi!")
demo_ephemeral_chat_history.add_ai_message("whats up?")
demo_ephemeral_chat_history.messages # This will print out all messages
demo_ephemeral_chat_history.add_user_message(
"Translate this sentence from English to French: I love programming."
)
response = chain.invoke({"messages": demo_ephemeral_chat_history.messages})
# AIMessage(content='The translation of "I love programming" in French is "J\'adore la programmation."')
demo_ephemeral_chat_history.add_ai_message(response) # add respond to the chat history
demo_ephemeral_chat_history.add_user_message("What did you just say?") # ask another question
chain.invoke({"messages": demo_ephemeral_chat_history.messages}) # get AI response with chat history
# AIMessage(content='I said "J\'adore la programmation," which is the French translation for "I love programming."')
domain-specific, knowledge dataset as context to augment LLM response to provide sources, used in RAG. A retriever is an interface that returns documents given an unstructured query, more than just a vector store.
Use LangSmith documentation as source material and store it in a vector store for a later retrieval..
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/overview")
data = loader.load()
Next, we split it into smaller chunks that the LLM’s context window can handle and store it in a vector database:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
Then we embed and store those chunks in a vector database:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
And finally, let’s create a retriever from our initialized vectorstore:
# k is the number of chunks to retrieve
retriever = vectorstore.as_retriever(k=4)
docs = retriever.invoke("how can langsmith help with testing?")
docs
Combine with the above-mentioned component to create RAG model:
from langchain.chains.combine_documents import create_stuff_documents_chain
chat = ChatOpenAI(model="gpt-3.5-turbo-1106")
question_answering_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Answer the user's questions based on the below context:\n\n{context}",
),
MessagesPlaceholder(variable_name="messages"),
]
)
document_chain = create_stuff_documents_chain(chat, question_answering_prompt) ### Combined with prompt with model
invoke the document chain with raw documents we retrieved above:
from langchain.memory import ChatMessageHistory
demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("how can langsmith help with testing?")
### Combine chat messages with document chain
document_chain.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
"context": docs,
}
)
# 'LangSmith can assist with testing by providing the capability to quickly edit examples and add them to datasets. This allows for the expansion of evaluation sets or fine-tuning of a model for improved quality or reduced costs. Additionally, LangSmith simplifies the construction of small datasets by hand, providing a convenient way to rigorously test changes in the application.'
integrate the retriever to gain relevant information to the last message. Pass context plus the previous message to generate a final answer.
from typing import Dict
from langchain_core.runnables import RunnablePassthrough
# get last message
def parse_retriever_input(params: Dict):
return params["messages"][-1].content
## use the RunnablePassthrough.assign() method to pass intermediate steps through at each invocation.
retrieval_chain = RunnablePassthrough.assign(
context=parse_retriever_input | retriever,
).assign(
answer=document_chain,
)
get response with last message with contxt
response = retrieval_chain.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
}
)
continue the conversation by adding the response back to chat history
demo_ephemeral_chat_history.add_ai_message(response["answer"])
demo_ephemeral_chat_history.add_user_message("tell me more about that!")
retrieval_chain.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
},
)
If you don’t want to return all the intermediate steps, you can define your retrieval chain like this using a pipe directly into the document chain instead of the final .assign()
call:
retrieval_chain_with_only_answer = (
RunnablePassthrough.assign(
context=parse_retriever_input | retriever,
)
| document_chain
)
retrieval_chain_with_only_answer.invoke(
{
"messages": demo_ephemeral_chat_history.messages,
},
)
tell me more about that!
, the answer don't directly include information about testing becaseu we are passing tell me more about that!
as a query to the retriever. The ouptu tof the retrieve chain is still okay because the document chain retrieval chain can generate an answer based on the chat history, but we can retrieve better answer based on chat history.
To get around the problem, let’s add a query transformation step that removes references from the input. What we do is to wrap our old retriever with a transformed search query:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableBranch
# We need a prompt that we can pass into an LLM to generate a transformed search query
chat = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0.2)
# this is the new template for query and transform
query_transform_prompt = ChatPromptTemplate.from_messages(
[
MessagesPlaceholder(variable_name="messages"),
(
"user",
"Given the above conversation, generate a search query to look up in order to get information relevant to the conversation. Only respond with the query, nothing else.",
),
]
)
query_transforming_retriever_chain = RunnableBranch(
(
lambda x: len(x.get("messages", [])) == 1,
# If only one message, then we just pass that message's content to retriever
(lambda x: x["messages"][-1].content) | retriever,
),
# If messages, then we pass inputs to LLM chain to transform the query, then pass to retriever
query_transform_prompt | chat | StrOutputParser() | retriever,
).with_config(run_name="chat_retriever_chain")
With this query_transforming_retriever_chain
, we can first pass inputs to LLM chains to transform the query to pass to retriever.
document_chain = create_stuff_documents_chain(chat, question_answering_prompt)
conversational_retrieval_chain = RunnablePassthrough.assign(
context=query_transforming_retriever_chain,
).assign(
answer=document_chain,
)
demo_ephemeral_chat_history = ChatMessageHistory()
And finally, let’s invoke it! demo_ephemeral_chat_history.add_user_message("how can langsmith help with testing?")
response = conversational_retrieval_chain.invoke( {"messages": demo_ephemeral_chat_history.messages}, )
demo_ephemeral_chat_history.add_ai_message(response["answer"])
response
Get a followup question,
```python
demo_ephemeral_chat_history.add_user_message("tell me more about that!")
conversational_retrieval_chain.invoke(
{"messages": demo_ephemeral_chat_history.messages}
)
https://python.langchain.com/docs/use_cases/chatbots/quickstart
What is langchain
LangChain is a framework for developing applications powered by language models. This framework consists of LangChain Libraries, LangChain Templates, LangServe, and LangSmith.