Open bp3000bp opened 3 weeks ago
Doing RAG on tabular data can be more tricky than RAG on paragraphs/lists/etc. You might find some ideas by searching the issue tracker for excel/csv/tabular.
As for figuring out where the issue is in your answer quality, please step through the process here: https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/customization.md#improving-answer-quality You can see what the retrieved chunks are, and whether they contain the answer. If they do contain the answer, then it's likely that the LLM doesn't understand the formatting, and you may want to try Excel or even converting to a Markdown table. If they don't contain the answer, then the problem is with the search step.
The issue is not in retrieval, the retrieval brings back the correct information, so it must be in the formatting? It's confusing when this is the case, usually the issue is the retrieval in my past issues. I will try to upload the Excel document itself, but I am fearful that it's just not going to work out with this document/information specifically.
@pamelafox instead of using excel, would using a different strategy work better? For example, making a separate document for each department, and instead of having tabular data, making a list formatted like this:
One docx file converted to PDF Accounting department contacts: name: first last, phone: 555-555-5555, email: firstlast@company.com, title: accounting coordinator name: first last, phone: 555-555-5555, email: firstlast@company.com, title: accounting coordinator name: first last, phone: 555-555-5555, email: firstlast@company.com, title: accounting coordinator
Second docx file converted to PDF Marketing department contacts: name: first last, phone: 555-555-5555, email: firstlast@company.com, title: marketing coordinator name: first last, phone: 555-555-5555, email: firstlast@company.com, title: marketing coordinator name: first last, phone: 555-555-5555, email: firstlast@company.com, title: marketing coordinator
Would this presentation of the information be easier for the LLM to retrieve and provide accurate responses for? This strategy would separate the info into several documents for different departments, containing several hundred employees. Or is this type of data just not going to work for this application?
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
azd version?
Versions
Mention any other details that might be useful