datatonic / duke

Automatically exported from code.google.com/p/duke
0 stars 0 forks source link

Why not just extend Lucene scoring? #24

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Just curious. You seem to be doing Bayes calculation after getting results from 
Lucene. Why not implement your own scoring instead? Wouldn't that work? Like - 
https://issues.apache.org/jira/browse/LUCENE-2091

Original issue reported on code.google.com by ashwin.j...@gmail.com on 18 Jun 2011 at 1:00

GoogleCodeExporter commented 8 years ago
Hmmmm. Interesting. The reason I haven't done this is that I wasn't aware that 
it could be done. I'll investigate this a bit to see if it could be useful for 
perfomance. Thank you for the tip!

(Apologies for the late answer; I've been on holiday.)

Original comment by lar...@gmail.com on 10 Jul 2011 at 11:38

GoogleCodeExporter commented 8 years ago
We'll wait for Lucene 4.0, which adds plugging of scoring models as an official 
feature, and look at this then. 
http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4/

Original comment by lar...@gmail.com on 23 Mar 2012 at 8:33

GoogleCodeExporter commented 8 years ago
I thought some more about this, and came to the conclusion that it's not going 
to work.

When we've got a record we want to deduplicate we use Lucene to quickly locate 
a set of potential candidates, on which we can then run real Duke comparison. 
The reason we do it this way is that real Duke comparison is very expensive, so 
we can only afford to do it for a very small subset of records.

So while it would work to do our own scoring within Lucene it would be 
prohibitively slow.

Lucene scoring seems to work really well for the "quickly find a set of 
candidates", so really we don't have much reason to change.

If you think I've misunderstood, please let me know. I'll leave the issue open 
for a week or so to give you time to respond.

Original comment by lar...@gmail.com on 20 Mar 2013 at 12:53

GoogleCodeExporter commented 8 years ago

Original comment by lar...@gmail.com on 25 Jul 2013 at 9:55