Exaphis / HackQ-Trivia

Yet another HQ Trivia bot. Automatically scrapes HQ Trivia questions without OCR and answers them.
MIT License
89 stars 54 forks source link

Question about Search Method 3 #95

Closed ICUI2I closed 6 years ago

ICUI2I commented 6 years ago

As usual, thank you so much for the work you did on this bot, it's really great. I fully understand how method 1 and 2 work, but I wanted to better understand how search method 3 works and what its output means so that I can make make more educated guesses, and use this bot to its full capacity. Could you explain in some detail what the bot is doing/evaluating in order to reach the below values? I looked through your documentation and source code on search method 3, but I'm still not fully understanding it. Thank you for your extra help.

Keyword scores: {'Annie Lennox': 50, 'Kate Bush': 102, 'Celine Dion': 75} Noun scores: {'Annie Lennox': 0, 'Kate Bush': 14, 'Celine Dion': 0}

Here is the full log as well:

Question detected. Question 8 out of 12 Which of these artists has NOT recorded a song based on "Wuthering Heights"? ['Annie Lennox', 'Kate Bush', 'Celine Dion']

Searching ['Annie Lennox', 'Kate Bush', 'Celine Dion'] ['artists', 'recorded', 'song', 'based', 'wuthering heights'] ['https://en.wikipedia.org/wiki/Wuthering_Heights_(song)', 'https://en.wikipedia.org/wiki/Songs_from_Heathcliff', 'http://www.slate.com/blogs/browbeat/2014/04/01/kate_bush_s_best_songs_for_a_new_listener_from_wuthering_heights_to_hounds.html', 'http://www.katebushencyclopedia.com/wuthering-heights', 'https://www.soundonsound.com/techniques/classic-tracks-kate-bush-wuthering-heights'] Running method 1 {'annie lennox': 0, 'kate bush': 40, 'celine dion': 0} Running method 2 {'Annie Lennox': {'annie': 0, 'lennox': 0}, 'Kate Bush': {'kate': 59, 'bush': 74}, 'Celine Dion': {'celine': 0, 'dion': 0}} Annie Lennox

Question nouns: ['wuthering heights'] Running method 3 Search processed URLs fetched

Annie Lennox: {'wuthering heights': 0} Kate Bush: {'wuthering heights': 14} Celine Dion: {'wuthering heights': 0}

Keyword scores: {'Annie Lennox': 50, 'Kate Bush': 102, 'Celine Dion': 75} Noun scores: {'Annie Lennox': 0, 'Kate Bush': 14, 'Celine Dion': 0}

Annie Lennox Search took 5.240917205810547 seconds Socket closed

CrazyReturns commented 6 years ago

The third search method is completely different, as it looks up the answer and tries to match the nouns of the question. In this case, method 3 searched "Neil Patrick Harris" on Google and got 4 matches of the noun "order", which made method 3 say Neil Patrick Harris was the correct answer.

He talked about it here: https://github.com/Exaphis/HackQ-Trivia/issues/94

Exaphis commented 6 years ago

Seems like this question is answered.

ICUI2I commented 6 years ago

Actually @CrazyReturns that does not answer my question. I'm asking for more detail than that because I still don't understand what that means. @Exaphis could you please answer my question as it relates to the output I posted? I would really appreciate it!

Exaphis commented 6 years ago

Method 3 basically searches the answer on Google, and tries to look for the question in the results. Taking your example, it detected the nouns (from NLTK's part of speech tagger/getting what is inside quotations) in the question. Then it searches google for each of the 3 answers. The noun scores is how often the question nouns were found in the search results, and the keyword scores are how often the question keywords were found in the search results. The question keywords would be ['artists', 'recorded', 'song', 'based', 'wuthering heights'], and the question nouns would be Question nouns: ['wuthering heights']. If the correct answer cannot be found from noun scores, then it turns to keyword scores.