Closed MichaelAquilina closed 10 years ago
Its second order ranking thats screwing this up, its too strong:
Running word concepts.... Word Concepts Runtime = 0.120518922806 [Wildlife regulations in Florida (9411284): 0.368049, Morelia spilota (1688446): 0.365470, Python reticulatus (739131): 0.306702, Burmese pythons in Florida (1532348): 0.302284, Python regius (866538): 0.297903, Python molurus (739148): 0.282415, Colt Python (789495): 0.276584, Mega Python vs. Gatoroid (1008818): 0.276584, Eric Idle (193640): 0.276583, Morelia tracyae (9415410): 0.276578, The Dirty Fork (3472601): 0.276569, Terry Gilliam (5851): 0.276563, Python Night � 30 Years of Monty Python (1885001): 0.276563, Neil Innes (305668): 0.276527, Another Monty Python Record (619776): 0.276515, Nudge Nudge (619752): 0.276515, Bashe (1366889): 0.276490, Erhu (399619): 0.276455, John Cleese (89492): 0.276406, Morelia spilota spilota (4556598): 0.276388] Second Order Runtime = 0.354458093643 [Python (programming language) (175067): 0.988947, Monty Python (162356): 0.803613, John Cleese (89492): 0.715986, Michael Palin (193634): 0.713746, Eric Idle (193640): 0.682033, Terry Jones (193695): 0.666316, Graham Chapman (305760): 0.647765, Terry Gilliam (5851): 0.641725, Monty Python and the Holy Grail (229687): 0.588486, Morelia spilota (1688446): 0.563674, Wildlife regulations in Florida (9411284): 0.552073, Monty Python Live at the Hollywood Bowl (451622): 0.539486, Pythonidae (12592): 0.527129, Neil Innes (305668): 0.526849, Carol Cleveland (451629): 0.516239, Do Not Adjust Your Set (336336): 0.511349, Python reticulatus (739131): 0.503444, Burmese pythons in Florida (1532348): 0.477005, The Ministry of Silly Walks (164739): 0.468017, John Du Prez (595184): 0.465477]
This only occurs on the tfidf-lookup-test
branch
Combination of cosine similarity and Second Order Ranking should fix this...
http://en.wikipedia.org/wiki/Python_(genus)