aws-samples / aws-genai-llm-chatbot

A modular and comprehensive solution to deploy a Multi-LLM and Multi-RAG powered chatbot (Amazon Bedrock, Anthropic, HuggingFace, OpenAI, Meta, AI21, Cohere, Mistral) using AWS CDK on AWS
https://aws-samples.github.io/aws-genai-llm-chatbot/
MIT No Attribution
1.07k stars 323 forks source link

Indexing Q-A in workspace for RAG #582

Open nkay28 opened 2 weeks ago

nkay28 commented 2 weeks ago

Hi, What is the best way/format to index a set of Questions and Answers into a workspace?
Also, can we add Q-A data into the same workspace alongside a set of PDFs? Or is it recommended to add them in separate workspaces only? I tried together, and the RAG doesn't seem to pick up Q-A content while chatting. So, I'm trying to figure out if my indexing is correct or not. Thank you.

charles-marion commented 1 week ago

Hi @nkay28 ,

Also, can we add Q-A data into the same workspace alongside a set of PDFs? Or is it recommended to add them in separate workspaces only?

I would say it depends on the use case.

It is possible to add Q-As and PDFs but the workspace query will only return the chunks of text relevant to the query and re-ranked by the Cross encoder model (only 3 results are added to the context) .

I would recommend to see what are the documents retuned (In the playground, the cog can be used to show the metadata). Alternatively, you can also use the semantic search page to test.

Q-A content Is it a fixed list of questions? It is it, it might be an example you would like to send as part of the prompt every time?

If yes, a possible option is to update the system prompt and list them there so they are always sent as example? https://github.com/aws-samples/aws-genai-llm-chatbot/blob/main/lib/model-interfaces/langchain/functions/request-handler/adapters/bedrock/base.py#L53

Note if you follow this path, there is a pending change refactoring this part: #576