Add machine learning to ElasticSearch results rankings

aws-solutions / qnabot-on-aws

AWS QnABot is a multi-channel, multi-language conversational interface (chatbot) that responds to your customer's questions, answers, and feedback. The solution allows you to deploy a fully functional chatbot across multiple channels including chat, voice, SMS and Amazon Alexa.

https://aws.amazon.com/solutions/implementations/aws-qnabot

Apache License 2.0

400 stars 252 forks source link

Add machine learning to ElasticSearch results rankings #59

Closed bigrig2212 closed 1 year ago

bigrig2212 commented 6 years ago

Just starting a thread for this enhancement. Add machine learning to search results ranking, so that results get better over time. Is especially important for chatbot, since first result may be your last chance to engage the user.

Will need some form of training good/bad rankings first. As logged here: https://github.com/awslabs/aws-ai-qna-bot/issues/41

As for ML methods to improve rankings, have come across this: https://github.com/o19s/elasticsearch-learning-to-rank

Eager to hear what other methods people have used/are familiar with.

JohnCalhoun commented 6 years ago

I like this thread. We can think even larger, what if we replaced elasticsearch completely with some ML technique. I think some recommendation systems would be interesting here.

I am currently reading this paper to see if there are any novel ideas: https://arxiv.org/pdf/1611.08097.pdf

this could also be an interesting use/integration with AWS sagemaker: https://aws.amazon.com/sagemaker/

JohnCalhoun commented 6 years ago

another paper: https://arxiv.org/pdf/1704.00051.pdf

JohnCalhoun commented 6 years ago

I think the first step here would be to first integration AWS Sagemaker with QnABot. So in the Sagemaker configuration, the fulfillment lambda calls the Sagemaker endpoint instead of elasticsearch.

bigrig2212 commented 6 years ago

Like that paper you found: https://arxiv.org/pdf/1704.00051.pdf "Reading Wikipedia to Answer Open-Domain Questions"

Seems like they use a document retrieval service (similar to ElasticSearch) to first narrow the results and then run the document reader model (the ML) over those narrowed results to pick out a relevant answer snippet.

I wouldn't replace ElasticSearch. I think it's a great/clever tie-in to Lex... and is very accessible. I do like the idea of extending it though. Ie: feeding the 5 highest scoring documents from ES into a "document comprehension/reader" model in Sagemaker.