RAG Steps - Githubissues

Text File Upload
- Function: Upload a text file containing the external knowledge base.
- Process: Converts the file into a Document format.
Recursive Character Text Splitter
- Function: Splits the text file into smaller, manageable chunks.
- Parameters:
  - Chunk Size: Controls the size of each chunk.
  - Chunk Overlap: Controls the overlap between chunks to maintain context.
OpenAI Embeddings
- Function: Generates vector representations for each text chunk using OpenAI's embedding model.
- Purpose: Measures similarity between text chunks and queries.
Pinecone
- Function: Stores the embedding vectors in a Pinecone vector database.
- Purpose: Optimized for fast similarity searches on vector data.
Conversational Retrieval QA Chain
- Function: Core of the RAG system; retrieves relevant chunks from the knowledge base.
- Process: Uses Pinecone to fetch the most relevant chunks based on semantic similarity.
ChatOpenAI
- Function: Generates a response using OpenAI's Chat API.
- Process: Combines the retrieved context from Pinecone with the user's query to generate a comprehensive answer.

User Query: The user asks a question.
Query Vectorization: RAG system converts the question into a vector using OpenAI embeddings.
Chunk Retrieval: Pinecone retrieves the most similar chunks from the knowledge base.
Response Generation: ChatOpenAI uses the retrieved context and the query to generate an accurate response.

Parameters:
- Chunk Size (chunkSize): Number
- Chunk Overlap (chunkOverlap): Number
- Custom Separators (separators): String

Parameters:
- Txt File (files): .txt, .html, .aspx, .asp, .cpp, .c, .cs, .css, .go, .h, .java, .js, .less, .ts, .php, .proto, .python, .py, .rst, .ruby, .rb, .rs, .scala, .sc, .scss, .sol, .sql, .swift, .markdown, .md, .tex, .ltx, .vb, .xml
- Metadata (metadata): JSON

Parameters:
- OpenAI API Key (openAIApiKey): String
- Model Name (modelName): gpt-4, gpt-4-turbo-preview, gpt-4-0125-preview, gpt-4-1106-preview, gpt-4-vision-preview, gpt-4-0613, gpt-4-32k, gpt-4-32k-0613, gpt-3.5-turbo, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k, gpt-3.5-turbo-16k-0613
- Temperature (temperature): Number
- Max Tokens (maxTokens): Number
- Top Probability (topP): Number
- Frequency Penalty (frequencyPenalty): Number
- Presence Penalty (presencePenalty): Number
- Timeout (timeout): Number
- BasePath (basepath): String
- BaseOptions (baseOptions): JSON

Parameters:
- OpenAI API Key (openAIApiKey): String
- Model Name (modelName): text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002
- Strip New Lines (stripNewLines): Boolean

Parameters:
- Return Source Documents (returnSourceDocuments): Boolean
- Rephrase Prompt (rephrasePrompt): String
- Response Prompt (responsePrompt): String

Parameters:
- Pinecone API Key (pineconeApiKey): String
- Pinecone Index (pineconeIndex): String
- Pinecone Namespace (pineconeNamespace): String
- Pinecone Metadata Filter (pineconeMetadataFilter): JSON
- Top K (topK): Number
- Search Type (searchType): Similarity, MMR
- Fetch K (for MMR Search) (fetchK): Number
- Lambda (for MMR Search) (lambda): Number

gauravpandeyDL / Feature-List