Closed SharpBit closed 6 years ago
The first/second search method depend on searching the question keywords in Google, and counting the number of occurrences of the answer in the search results. Search method 1 counts exact answer matches, and method 2 counts occurrences of answer keywords. For this question, the bot googled "whose twitter handle comes first alphabetical order" and method 2 detected the word "James" 3 times in the search results, causing it to be selected as the correct answer.
The third search method is completely different, as it looks up the answer and tries to match the nouns of the question. In this case, method 3 searched "Neil Patrick Harris" on Google and got 4 matches of the noun "order", which made method 3 say Neil Patrick Harris was the correct answer.
Looking at what caused the search methods to say what was correct, I wouldn't have trusted either to answer that question.
Thanks for the speedy reply. That makes much more sense now. I realized that when the keyword search and noun search is different, keyword searches have been correct 3/5 times and noun searches have been correct 2/5 times. I just tested it on two HQ games today (normal and UK). In the normal game at 3 pm EST, it showed a blank answer for 3 questions in a row then fixed itself when I restarted the bot. Later that game, the websocket connected too late and missed a question. It got 7/8 accuracy that game. In the UK game at 8 pm UTC, it got 9/10 correct. One of them it got wrong (comparison question) and the other one the websocket connected late again, despite connecting once per second (I changed it from 5 to 1 second). Is it possible for you to make it connect to the websocket for the duration of the game rather than refresh every few seconds and risk missing a question? Do you have any plans to improve accuracy on comparison questions? I will continue to test it on more games, and try to figure out if keyword or noun search is more reliable when the results are different. If you know which is USUALLY correct more (keyword or noun), please let me know. If you want, I can create another issue about connecting to the websocket for the whole game and comparison questions.
A came across another interesting thing tonight. In tonight's HQ game this was a question:
Which of these video game companies first made playing cards?
1) Nintendo
2) Atari
3) Sega
Keyword searches:
1) 38
2) 162
3) 421
Noun searches:
1) 0
2) 0
3) 1
Why did the bot choose the answer with the least matches if the question didn't contain "NOT", "least", or "never"? What made it choose Nintendo instead of Sega (though Nintendo WAS the correct answer)
Also this question:
From where did the Titanic originally sail?
KW Noun
1) United States 17 0
2) England 18 1
3) Northern Ireland 37 23
didn't have any "reverse" words but it chose the one with the least matches and got it wrong. The answer was Northern Ireland. This really confuses me
@SharpBit yeah i've seen this too. it seems like sometimes method will just say the answer is the first choice regardless of the search results.
also, i saw you say it got 9/10 on HQ UK. it does UK?
@tdaddy just put a UK bearer token and user ID and it will do UK when it goes live
Edit: it just chose the first choice today on q12 when keyword search says 2 (the answer was 2)
@SharpBit, extremely odd result you got there. Mine picked the question with the highest occurrences every time. My output for the first question is
Running method 1
{'nintendo': 597, 'atari': 7, 'sega': 13}
nintendo
[('Which', 'NNP'), ('of', 'IN'), ('these', 'DT'), ('video', 'NNS'), ('game', 'NN'), ('companies', 'NNS'), ('first', 'RB'), ('made', 'VBD'), ('playing', 'VBG'), ('cards', 'NNS'), ('?', '.')]
Question nouns: ['cards']
Running method 3
Search processed
URLs fetched
Nintendo: {'cards': 9}
Atari: {'cards': 0}
Sega: {'cards': 1}
Keyword scores: {'Nintendo': 351, 'Atari': 179, 'Sega': 188}
Noun scores: {'Nintendo': 9, 'Atari': 0, 'Sega': 1}
Nintendo
and the output for the second question is
Running method 1
{'united states': 15, 'england': 10, 'northern ireland': 10}
united states
[('From', 'IN'), ('where', 'WRB'), ('did', 'VBD'), ('the', 'DT'), ('Titanic', 'NNP'), ('originally', 'RB'), ('sail', 'VB'), ('?', '.')]
Question nouns: ['titanic']
Running method 3
Search processed
Server timeout/error to https://discovernorthernireland.com/
United States: {'titanic': 0}
England: {'titanic': 0}
Northern Ireland: {'titanic': 1}
Keyword scores: {'United States': 1, 'England': 7, 'Northern Ireland': 2}
Noun scores: {'United States': 0, 'England': 0, 'Northern Ireland': 1}
Northern Ireland
both selecting the highest scoring answer.
Regarding your websocket issue, that is known and is being fixed.
@Exaphis Why did mine show different search results than yours? Also just tell me in this thread when you fix the websocket issue
Probably Google changing results due to location/search history.
How would location affect which company created the first playing cards? Titanic one might make sense with that explanation since I'm in the US.
wait @Exaphis how would location/search history affect anything if it;s googling through a bot/program?
The bot is just scraping the Google site instead of using the Google API, since the API has a limit of 100 requests/day. A basic way of explaining it would be the bot is searching the question through a different browser. I don't know what specifically Google does, but it probably takes into account your IP address and gives you different results based on where it determines your location to be. From a private window in my browser, I can search one thing, turn my VPN to the UK, create a new private window, and search the same thing again to get two different sets of search results.
@Exaphis If I open a vpn will that possibly help? idk how to make the program search through something like incognito or disabled location-based searching so... Also when should i expect the websocket issue to be fixed?
Using a VPN shouldn't change the accuracy of your answers, you can't predict what Google will serve you. The websocket issue should be fixed in the coming week.
@Exaphis after another test, it still goes with random answers in another app I'm posting to, but in console it prints the actual correct answer. I didn't change best_answer
at all after it gets printed except I made it "None" if the best answer is "" (empty string). What makes it change? Idk how it changed
What do you mean it chooses the wrong answer when it's sent somewhere else? Doesn't seem like a problem with my code.
@Exaphis is the websocket issue being fixed? It's been 6 days
Should be fixed in https://github.com/Exaphis/HackQ-Trivia/commit/73a5e82d06dd8dd9534173b49490ab20173d9e3e, not tested yet.
was just about to close this, thanks
What's the difference between all 3 search methods? How come it does search 1, then if no best answer is found, it does search 2 and sends the result anyway then does search 3 then sends that too? Basically I'm asking how does each method work? If the best answer and search method 3 get different answers, which do we choose? Last night this happened:
The best answer said LeBron James but search method 3 said Neil Patrick Harris and that was the answer (savage question). I yesterday 3 people on $8333 each so the bot kinda failed. I hope to test 12 question games today.