Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.91k stars 4.05k forks source link

Feature request: Respond with whether the model knows about a file #493

Open nickroseth opened 1 year ago

nickroseth commented 1 year ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Uploaded new documents and ran prepdocs, asked prompt to find document

Any log messages given by the failure

Expected/desired behavior

response returns that it has access to the data in that document

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Mac - Ventura, github codespace

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful

Attached shows that a conversation where I am asking if the prompt has access to a recently uploaded document and it says that it does not. Then I ask if it has access to specific text in that document and it says yes and references the document. It seems to suddenly become 'aware', but is unaware until the text is referenced. Is this normal behavior? Is there a different prompt that should be written? Seems very inconsistent in a number of responses when asking about different parts of a single document or which documents it has access to. (note I have tried increasing the number of documents in settings up to 15 which is about where I hit the token max). Appreciate any insights! I


Thanks! We'll be in touch soon. Screenshot 2023-08-02 at 11 58 57 AM

pamelafox commented 1 year ago

I think you would need to change the prompt if this is a desired feature. Another approach would be to query the cognitive search index for which files are available, and show a list of them UI, that way you wouldn't need to ask.

nickroseth commented 1 year ago

Thanks, @pamelafox. Ultimately the question I have is why the inconsistency in response as to what data it can search? Why does it appear to be aware of information in a document sometimes. If I wanted to pull data points from two documents I assume I would need to reference document A and document B, but it seems it's not aware of either document at any given point. Perhaps it can't be used in that manner?

pamelafox commented 1 year ago

@nickroseth I am more of a Python expert than an LLM expert, but my suggestion would be to look at the various approaches and see what happens along the way. For example, for the /chat approach, it first tries to turn your question into a query for Cognitive Search, then gathers the data, then asks the question of that data. You might find it better to tweak that approach, or try one of the ask approaches. You can look at the Thought process tab in the UI to see how each approach tackles it.

mhalomari commented 1 year ago

Hi @nickroseth, I guess the way this app approaches text content is not searching file names, but using file names to search their content! It is not clear enough for me, but reviewing the code I found three types of approaches: export const enum Approaches { RetrieveThenRead = "rtr", ReadRetrieveRead = "rrr", ReadDecomposeAsk = "rda" }

I tried your request using the /ask api and it worked using the right prompt: Endpoint: https://app-backend-YOURDEPLOYMENT.azurewebsites.net/ask

Request: { "question": [ { "user": "Show any result from document \"NewBill?\"" } ], "approach": "rda", "overrides": { "retrieval_mode": "hybrid", "semantic_ranker": true, "semantic_captions": true, "top": 3, "temperature": 0, "prompt_template": "example_template", "prompt_template_prefix": "example_prefix", "prompt_template_suffix": "example_suffix", "exclude_category": "example_category" } }

And I got the following result: "answer": "NewBill-0.pdf",

While checking the Search service index:

GPTindex1

I found the following filters activated for the used index: GPTindex

As you can see, the only searchable column of the index is the file "content". Not sure how to change it.

Hope this helps

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.