How to restrict the response within pdf content?

Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.

https://azure.microsoft.com/products/search

MIT License

6.15k stars 4.18k forks source link

How to restrict the response within pdf content? #487

Open TarunKC261 opened 1 year ago

TarunKC261 commented 1 year ago

As shown in screenshot, when asked about prime minister it gave correct answer with incorrect citation. I want to restrict these type of questions which are outside the scope of pdf and show the response as "Question out of scope".

Dag-Calafell-MCA commented 1 year ago

You might try modifying the AOAI prompt to instruct it to only answer questions from the retrieved documents.

TarunKC261 commented 1 year ago

I have added below instructions in the prompt but still it is giving results outside the scope of pdf uploaded. """Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below"""

Also added the instructions multiple times """Strictly generate the answer only from the sources listed below. If there isn't enough information below, say you don't know. Do not generate response outside the content below."""

Dag-Calafell-MCA commented 1 year ago

After deploying, I noticed the "retrieval mode" in the "Developer Settings"... you should try "Vectors only" because this should limit it to just the PDFs (vector db). I haven't reviewed the code however.... but I did test "What are the federal laws which govern Worker's Compensation?" and it seemed consistent with my assumption.

pamelafox commented 1 year ago

Hm, I don't think that would affect whether ChatGPT decides to answer outside of the indexed data, that just affects whether it does both a semantic and vector search (see https://techcommunity.microsoft.com/t5/azure-ai-services-blog/announcing-vector-search-in-azure-cognitive-search-public/ba-p/3872868)

I think that ChatGPT just isn't always adhering to the prompt. It may be worth trying GPT-4, if you have access to that, as I have heard it's better at sticking to instructions.

Dag-Calafell-MCA commented 1 year ago

Really good insight there, so Cognitive Search can be my vector database now. I did not know!

itmilos commented 1 year ago

@pamelafox @Dag-Calafell-MCA During Semantic Kernel Office Hours, this was discussed. It appears that currently, only an effective system prompt can prevent it.

I'm considering using an approach similar to what the HackAPrompt team utilized.

ref. https://huggingface.co/spaces/jerpint-org/hackaprompt/blob/main/hackaprompt/utils.py

Also good reference for how to HackAPrompt https://github.com/terjanq/hack-a-prompt/tree/master

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.