About confidence - Githubissues

Hey there! I know I'm responding an eternity later - I saw your message as soon as you sent it, and it's been on the back burner for way too long. To answer your question, the answer confidence is far from concrete or rigorous. My process simply consisted of scraping 600 HQ questions from the web (there is a twitter page with a nice archive) with corresponding answers, running my bot against it and checking the results. I first compared different search methods and saw which got me the highest percentage of right answers, tweaked the search method and repeated until I got something decent. From there, I, by hand, went though each question I got wrong and categorized them by reason why.

From there I took the reasons I couldn't really do much about and switched gears into making the algorithm recognize the problems. From there, the more I problems I recognized, and the more destructive I know those problems to be, the lower the confidence. I then took a ton of potential answer confidence formulas and pitted them against each other empirically. Ideally, questions in higher answer confidence brackets (c<30, 30 <= c < 50, 50 <= c < 70, etc. for example) should have more brackets in that question answered correctly. So I graphed a ton of potential answer confidence formulas and visually examined the graphs, taking the ones that were pretty close to increasing functions (even with a great function it would never be exact since my sample size wasn't THAT big) and trying to combine them in different ways and tweak thresholds to see if I could get even better results. Eventually I got something I was happy with.

As you probably noticed by now, it was a very inexact science. If you were expecting a PhD master thesis on natural language processing, this aint it. If you want a more detailed (albeit scatterbrained) view of my process, I left a lot of misc stuff in the "Assets/Past Research Files.zip" file. Check it out

backslash-enn / TriviaCheese

About confidence #1