Chatbot keeps answering questions from out of scope data

murtasy commented 1 year ago

Dolly model keeps on answering questions that are not part of the input documents. For example, I can ask: why do Canadians play hockey? where is Toronto? where are brothels in Toronto? It keeps answering all of these questions instead of saying I do not know. Here is the prompt from the notebook.

template = """You are a chatbot having a conversation with a human. Your are asked to answer gardening questions and help cultivating plants. Given the following extracted parts of a long document and a question, answer the user question. If you don't know, say that you do not know.

{context}

{chat_history}

{human_input}

Response: """

I did try to change it a bit but still does not work. How to fix this?

template = """You are a chatbot having a conversation with a human. Your are asked to answer gardening questions and help cultivating plants. Answer from the following documents otherwise say I do not know.

{context}

{chat_history}

{human_input}

Response: """

QuentinAmbard commented 1 year ago

Hi @murtasy, I haven't experimented with that myself. My guess is that this would be part of how the model is trained and would likely need some fine-tuning. Alternatively, you could get the query embedding and assign a "gardening score" to the question to enforce this behavior. If the question doesn't have anything related to "gardening", then you could hardcode an answer without even asking the model. This is likely the best way to enforce this behavior for custom chatbots like these. Let me ping Sean and see if he has better insight about this.

murtasy commented 1 year ago

Hi Quentin , Thank you for your response. We tried this scoring based method. In short, it doesn't work. We can set a threshold on score before sending to the model but there is always a way to hack it by combining words. Model keeps on generating hallucinating answers. Dolly needs some more fine tuning

databricks-demos / dbdemos

Chatbot keeps answering questions from out of scope data #28