Closed rstrahan closed 6 years ago
Using a nested datatype for questions seems to be the way to go. With this approach, the terms are matched separately against each question.
Mapping for nested questions:
curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index" -d '{
"mappings": {
"qna": {
"properties":{
"qid":{"type":"keyword"},
"question":{
"type":"nested"
},
"a":{
"type":"text",
"analyzer":"english"
},
"r":{"properties":{
"attachmentLinkUrl":{"type":"keyword"},
"buttons":{"properties":{
"text":{"type":"text"},
"value":{"type":"keyword"}
}},
"imageUrl":{"type":"keyword"},
"subTitle":{"type":"text"},
"title":{"type":"text"}
}}
}
}
}
}'
Test Data:
curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.001" -d '{
"question": [
{"q":"tell me about snorkeling"}
],
"a": "Snorkeling is cool!",
"r": {
"title": "",
"imageUrl": ""
},
"qid": "test.001"
}'
curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.002" -d '{
"question": [
{"q":"tell me about snorkel prices"}
],
"a": "Snorkels are not expensive",
"r": {
"title": "",
"imageUrl": ""
},
"qid": "test.002"
}'
Add more questions to test.001 to check that we still get correct answer:
curl -H'Content-Type: application/json' -XPUT "$ESURL/qna-index/qna/test.001" -d '{
"question": [
{"q":"tell me about snorkeling"},
{"q":"nothing in common"},
{"q":"something else again"}
],
"a": "Snorkeling is cool!",
"r": {
"title": "",
"imageUrl": ""
},
"qid": "test.001"
}'
Test query:
Notes:
?search_type=dfs_query_then_fetch
with small numbers of objects to combine idf across shards. (QnABot handler already does this)score_mode:max
used to return the score of the strongest question match, and avoid diluting a strong match with other weaker matches.boost
used to give double weighting to matches on question, compared to matches on answer (previously implemented with multi-match 'fields' syntax.curl -H'Content-Type: application/json' -XPOST "$ESURL/qna-index/qna/_search?search_type=dfs_query_then_fetch" -d '{
"query":{
"bool":{
"should":[
{
"nested":{
"path":"question",
"score_mode":"max",
"boost":2,
"query":{
"match":{
"question.q":"tell me about snorkeling"
}
}
}
},
{
"match":{
"a":"tell me about snorkeling"
}
},
{
"match":{
"t":"topicvalue"
}
}
]
}
}
}'
This change will modify the JSON structure for documents. The content designer 'Import' function should support previous JSON structure for backward compatibility and to allow content migration, however i think we can migrate export and import to new nested structure going forward.
We should probably also take this opportunity to rename the fields in the document JSON to replace fields "a", "q", "t", "r" with more explicit longform names that better reflect the field meaning.
fixed in v2.0.0
Expected Behavior
Adding new questions to an item should not adversely affect any answers that were previously correctly matched.
Actual Behavior
Adding a new question can actually weaken the score of an existing good match, sometimes causing another item to have a stronger score. See example below.
Steps to Reproduce the Problem
Import two items in attached file: test.txt
Switch to 'test' tab, and test question "tell me about snorkeling" - observe that the expected item is 'test.001' has the higher score (though by a slim margin)
Now edit 'test.001' and add a second question: "what should I know about snorkeling"
Rerun the test with the same question.. Now the other answer has the higher score.
It is counter intuitive, and undesirable, that adding the second question would change the answer.
Analysis
The QnABot uses elastic search ‘full text search’ capability to create ‘relevance scores’ for each QnA item. Relevance scores are computed by weighting a number of different factors in an effort to get the best match – see What is relevance
There are three factors to the scoring a) term frequency, b) inverse document frequency, and 3) field length norm.. I believe it is this third factor that is biting us here.
This is because the effect of adding the second question to the first item was to make the whole ‘question’ field longer, which reduced the relevance score of the match of item 1 (die to the ‘field length norm’ behavior aforementioned). The score was reduced to the point where it was slightly lower than the first item.
NOTE: This situation really only arises when the question results in very similar scores on multiple items.. Ie when similarilties between the questions prevent a strong unique match.
Options
A shortterm workaround can be to duplicate the question in order to increase the 'terms frequency' part of the scoring equation. ie adding a 3rd question to 'test.001' duplicating the initial question "tell me about snorkelling" once again results in this item having the highest score. Despite the fact that the 3rd question also lengthened the field further, the fact that it was a strong match to the question had the net effect of strengthening the overall score. However, while this technique might be useful for avoiding this specific problem, I do worry that it could introduce new problems by weakening the score of other question variants. Could become a game of 'whack-a-mole'!
Better if we can fix a fix in the code 1) (preferred) find a way to construct the doctype mapping or the query to negate the problematic 'field length norm' factor when matching on the question lists. The total number of or length of the questions should ideally not affect the scoring of a match. See disable field-length norm in mapping
2) alternatively, enhance the elastic search document structure to model each question independently, either by duplicating answers where there are multiple questions, or using a parent/child nested mapping to next questions as separate documents under the parent answer.