dialoguemd / covid-19

Canada COVID-19: What You Need to Know.
https://covid19.dialogue.co/
MIT License
39 stars 12 forks source link

test Google search for document retrieval #248

Open alexissmirnov opened 4 years ago

breuleux commented 4 years ago

I've compiled some results using the custom search Sacha made, applied on the thousand or so questions we have:

I've associated each question to the number of results, the snippet given by Google, an estimate of the relevant sections (I searched for each snippet in the scraped data), and the link. These columns are associated to the first result given by Google (I see little point looking at the others).

The results are quite mediocre. It answers the French question "Quels sont les symptômes" properly, but the snippet given also contains "Quels sont les services d'enseignement visés par les fermetures?" and there's no way to automatically prioritize the correct section.

It does not answer "What are the symptoms" in English properly, however. It provides this link which lists symptoms related to, stress, anxiety and depression. That's not what the user expects, obviously. Elasticsearch and other solutions will probably have the same problem if this section is included in the search.

The code is messy and contains an API key, but I can try to clean it up and push it if necessary.