Exaphis / HackQ-Trivia

Yet another HQ Trivia bot. Automatically scrapes HQ Trivia questions without OCR and answers them.
MIT License
89 stars 54 forks source link

Random answer chosen regardless of search results when sent somewhere else #94

Closed SharpBit closed 6 years ago

SharpBit commented 6 years ago

What's the difference between all 3 search methods? How come it does search 1, then if no best answer is found, it does search 2 and sends the result anyway then does search 3 then sends that too? Basically I'm asking how does each method work? If the best answer and search method 3 get different answers, which do we choose? Last night this happened:

Whos Twitter handle comes first in alphabetical order?
Ellen DeGeneres (@TheEllenShow)
Neil Patrick Harris (@ActuallyNPH)
LeBron James (@KingJames)

The best answer said LeBron James but search method 3 said Neil Patrick Harris and that was the answer (savage question). I yesterday 3 people on $8333 each so the bot kinda failed. I hope to test 12 question games today.

Exaphis commented 6 years ago

The first/second search method depend on searching the question keywords in Google, and counting the number of occurrences of the answer in the search results. Search method 1 counts exact answer matches, and method 2 counts occurrences of answer keywords. For this question, the bot googled "whose twitter handle comes first alphabetical order" and method 2 detected the word "James" 3 times in the search results, causing it to be selected as the correct answer.

The third search method is completely different, as it looks up the answer and tries to match the nouns of the question. In this case, method 3 searched "Neil Patrick Harris" on Google and got 4 matches of the noun "order", which made method 3 say Neil Patrick Harris was the correct answer.

Looking at what caused the search methods to say what was correct, I wouldn't have trusted either to answer that question.

SharpBit commented 6 years ago

Thanks for the speedy reply. That makes much more sense now. I realized that when the keyword search and noun search is different, keyword searches have been correct 3/5 times and noun searches have been correct 2/5 times. I just tested it on two HQ games today (normal and UK). In the normal game at 3 pm EST, it showed a blank answer for 3 questions in a row then fixed itself when I restarted the bot. Later that game, the websocket connected too late and missed a question. It got 7/8 accuracy that game. In the UK game at 8 pm UTC, it got 9/10 correct. One of them it got wrong (comparison question) and the other one the websocket connected late again, despite connecting once per second (I changed it from 5 to 1 second). Is it possible for you to make it connect to the websocket for the duration of the game rather than refresh every few seconds and risk missing a question? Do you have any plans to improve accuracy on comparison questions? I will continue to test it on more games, and try to figure out if keyword or noun search is more reliable when the results are different. If you know which is USUALLY correct more (keyword or noun), please let me know. If you want, I can create another issue about connecting to the websocket for the whole game and comparison questions.

SharpBit commented 6 years ago

A came across another interesting thing tonight. In tonight's HQ game this was a question:

Which of these video game companies first made playing cards?
1) Nintendo
2) Atari
3) Sega
Keyword searches:
1) 38
2) 162
3) 421
Noun searches:
1) 0
2) 0
3) 1

Why did the bot choose the answer with the least matches if the question didn't contain "NOT", "least", or "never"? What made it choose Nintendo instead of Sega (though Nintendo WAS the correct answer)

Also this question:

From where did the Titanic originally sail?
                        KW            Noun
1) United States       17              0
2) England             18              1
3) Northern Ireland    37             23

didn't have any "reverse" words but it chose the one with the least matches and got it wrong. The answer was Northern Ireland. This really confuses me

tdaddy commented 6 years ago

@SharpBit yeah i've seen this too. it seems like sometimes method will just say the answer is the first choice regardless of the search results.

also, i saw you say it got 9/10 on HQ UK. it does UK?

SharpBit commented 6 years ago

@tdaddy just put a UK bearer token and user ID and it will do UK when it goes live

Edit: it just chose the first choice today on q12 when keyword search says 2 (the answer was 2)

Exaphis commented 6 years ago

@SharpBit, extremely odd result you got there. Mine picked the question with the highest occurrences every time. My output for the first question is

Running method 1
{'nintendo': 597, 'atari': 7, 'sega': 13}
nintendo

[('Which', 'NNP'), ('of', 'IN'), ('these', 'DT'), ('video', 'NNS'), ('game', 'NN'), ('companies', 'NNS'), ('first', 'RB'), ('made', 'VBD'), ('playing', 'VBG'), ('cards', 'NNS'), ('?', '.')]
Question nouns: ['cards']
Running method 3
Search processed
URLs fetched

Nintendo: {'cards': 9}
Atari: {'cards': 0}
Sega: {'cards': 1}

Keyword scores: {'Nintendo': 351, 'Atari': 179, 'Sega': 188}
Noun scores: {'Nintendo': 9, 'Atari': 0, 'Sega': 1}
Nintendo

and the output for the second question is

Running method 1
{'united states': 15, 'england': 10, 'northern ireland': 10}
united states

[('From', 'IN'), ('where', 'WRB'), ('did', 'VBD'), ('the', 'DT'), ('Titanic', 'NNP'), ('originally', 'RB'), ('sail', 'VB'), ('?', '.')]
Question nouns: ['titanic']
Running method 3
Search processed
Server timeout/error to https://discovernorthernireland.com/
United States: {'titanic': 0}
England: {'titanic': 0}
Northern Ireland: {'titanic': 1}

Keyword scores: {'United States': 1, 'England': 7, 'Northern Ireland': 2}
Noun scores: {'United States': 0, 'England': 0, 'Northern Ireland': 1}
Northern Ireland

both selecting the highest scoring answer.

Regarding your websocket issue, that is known and is being fixed.

SharpBit commented 6 years ago

@Exaphis Why did mine show different search results than yours? Also just tell me in this thread when you fix the websocket issue

Exaphis commented 6 years ago

Probably Google changing results due to location/search history.

SharpBit commented 6 years ago

How would location affect which company created the first playing cards? Titanic one might make sense with that explanation since I'm in the US.

SharpBit commented 6 years ago

wait @Exaphis how would location/search history affect anything if it;s googling through a bot/program?

Exaphis commented 6 years ago

The bot is just scraping the Google site instead of using the Google API, since the API has a limit of 100 requests/day. A basic way of explaining it would be the bot is searching the question through a different browser. I don't know what specifically Google does, but it probably takes into account your IP address and gives you different results based on where it determines your location to be. From a private window in my browser, I can search one thing, turn my VPN to the UK, create a new private window, and search the same thing again to get two different sets of search results.

SharpBit commented 6 years ago

@Exaphis If I open a vpn will that possibly help? idk how to make the program search through something like incognito or disabled location-based searching so... Also when should i expect the websocket issue to be fixed?

Exaphis commented 6 years ago

Using a VPN shouldn't change the accuracy of your answers, you can't predict what Google will serve you. The websocket issue should be fixed in the coming week.

SharpBit commented 6 years ago

@Exaphis after another test, it still goes with random answers in another app I'm posting to, but in console it prints the actual correct answer. I didn't change best_answer at all after it gets printed except I made it "None" if the best answer is "" (empty string). What makes it change? Idk how it changed

Exaphis commented 6 years ago

What do you mean it chooses the wrong answer when it's sent somewhere else? Doesn't seem like a problem with my code.

SharpBit commented 6 years ago

@Exaphis is the websocket issue being fixed? It's been 6 days

Exaphis commented 6 years ago

Should be fixed in https://github.com/Exaphis/HackQ-Trivia/commit/73a5e82d06dd8dd9534173b49490ab20173d9e3e, not tested yet.

SharpBit commented 6 years ago

was just about to close this, thanks