aws-solutions / qnabot-on-aws

AWS QnABot is a multi-channel, multi-language conversational interface (chatbot) that responds to your customer's questions, answers, and feedback. The solution allows you to deploy a fully functional chatbot across multiple channels including chat, voice, SMS and Amazon Alexa.
https://aws.amazon.com/solutions/implementations/aws-qnabot
Apache License 2.0
396 stars 252 forks source link

Adding a new question to a QnA item can change response to an existing question #44

Closed rstrahan closed 6 years ago

rstrahan commented 6 years ago

Expected Behavior

Adding new questions to an item should not adversely affect any answers that were previously correctly matched.

Actual Behavior

Adding a new question can actually weaken the score of an existing good match, sometimes causing another item to have a stronger score. See example below.

Steps to Reproduce the Problem

Import two items in attached file: test.txt

Switch to 'test' tab, and test question "tell me about snorkeling" - observe that the expected item is 'test.001' has the higher score (though by a slim margin)

image

Now edit 'test.001' and add a second question: "what should I know about snorkeling"
Rerun the test with the same question.. Now the other answer has the higher score.

image

It is counter intuitive, and undesirable, that adding the second question would change the answer.

Analysis

The QnABot uses elastic search ‘full text search’ capability to create ‘relevance scores’ for each QnA item. Relevance scores are computed by weighting a number of different factors in an effort to get the best match – see What is relevance

There are three factors to the scoring a) term frequency, b) inverse document frequency, and 3) field length norm.. I believe it is this third factor that is biting us here.

This is because the effect of adding the second question to the first item was to make the whole ‘question’ field longer, which reduced the relevance score of the match of item 1 (die to the ‘field length norm’ behavior aforementioned). The score was reduced to the point where it was slightly lower than the first item.

NOTE: This situation really only arises when the question results in very similar scores on multiple items.. Ie when similarilties between the questions prevent a strong unique match.

Options

A shortterm workaround can be to duplicate the question in order to increase the 'terms frequency' part of the scoring equation. ie adding a 3rd question to 'test.001' duplicating the initial question "tell me about snorkelling" once again results in this item having the highest score. Despite the fact that the 3rd question also lengthened the field further, the fact that it was a strong match to the question had the net effect of strengthening the overall score. However, while this technique might be useful for avoiding this specific problem, I do worry that it could introduce new problems by weakening the score of other question variants. Could become a game of 'whack-a-mole'!

Better if we can fix a fix in the code 1) (preferred) find a way to construct the doctype mapping or the query to negate the problematic 'field length norm' factor when matching on the question lists. The total number of or length of the questions should ideally not affect the scoring of a match. See disable field-length norm in mapping

rstrahan commented 6 years ago

Using a nested datatype for questions seems to be the way to go. With this approach, the terms are matched separately against each question.

Mapping for nested questions:

curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index" -d '{
  "mappings": {
    "qna": {
                   "properties":{
                        "qid":{"type":"keyword"},
                        "question":{
                            "type":"nested"
                            },
                        "a":{
                            "type":"text",
                            "analyzer":"english"
                        },
                        "r":{"properties":{
                            "attachmentLinkUrl":{"type":"keyword"},
                            "buttons":{"properties":{
                                "text":{"type":"text"},
                                "value":{"type":"keyword"}
                            }},
                            "imageUrl":{"type":"keyword"},
                            "subTitle":{"type":"text"},
                            "title":{"type":"text"}
                        }}
                    }
                }
            }
}'

Test Data:

curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.001" -d '{
         "question": [
            {"q":"tell me about snorkeling"}
         ],
         "a": "Snorkeling is cool!",
         "r": {
            "title": "",
            "imageUrl": ""
         },
         "qid": "test.001"
}'
curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.002" -d '{
         "question": [
            {"q":"tell me about snorkel prices"}
         ],
         "a": "Snorkels are not expensive",
         "r": {
            "title": "",
            "imageUrl": ""
         },
         "qid": "test.002"
}'

Add more questions to test.001 to check that we still get correct answer:

curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.001" -d '{
         "question": [
            {"q":"tell me about snorkeling"},
            {"q":"nothing in common"},
            {"q":"something else again"}
         ],
         "a": "Snorkeling is cool!",
         "r": {
            "title": "",
            "imageUrl": ""
         },
         "qid": "test.001"
}'

Test query:

Notes:

curl -H'Content-Type: application/json' -XPOST "$ESURL/qna-index/qna/_search?search_type=dfs_query_then_fetch" -d '{  
   "query":{  
      "bool":{  
         "should":[  
            {  
               "nested":{  
                  "path":"question",
                  "score_mode":"max",
                  "boost":2,
                  "query":{  
                     "match":{  
                        "question.q":"tell me about snorkeling"
                     }
                  }
               }
            },
            {
                "match":{
                    "a":"tell me about snorkeling"
                }
            },
            {
                 "match":{
                    "t":"topicvalue"
                }
            }     
         ]
      }
   }
}'

This change will modify the JSON structure for documents. The content designer 'Import' function should support previous JSON structure for backward compatibility and to allow content migration, however i think we can migrate export and import to new nested structure going forward.

We should probably also take this opportunity to rename the fields in the document JSON to replace fields "a", "q", "t", "r" with more explicit longform names that better reflect the field meaning.

JohnCalhoun commented 6 years ago

fixed in v2.0.0