Closed GoogleCodeExporter closed 8 years ago
Hmmmm. Interesting. The reason I haven't done this is that I wasn't aware that
it could be done. I'll investigate this a bit to see if it could be useful for
perfomance. Thank you for the tip!
(Apologies for the late answer; I've been on holiday.)
Original comment by lar...@gmail.com
on 10 Jul 2011 at 11:38
We'll wait for Lucene 4.0, which adds plugging of scoring models as an official
feature, and look at this then.
http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4/
Original comment by lar...@gmail.com
on 23 Mar 2012 at 8:33
I thought some more about this, and came to the conclusion that it's not going
to work.
When we've got a record we want to deduplicate we use Lucene to quickly locate
a set of potential candidates, on which we can then run real Duke comparison.
The reason we do it this way is that real Duke comparison is very expensive, so
we can only afford to do it for a very small subset of records.
So while it would work to do our own scoring within Lucene it would be
prohibitively slow.
Lucene scoring seems to work really well for the "quickly find a set of
candidates", so really we don't have much reason to change.
If you think I've misunderstood, please let me know. I'll leave the issue open
for a week or so to give you time to respond.
Original comment by lar...@gmail.com
on 20 Mar 2013 at 12:53
Original comment by lar...@gmail.com
on 25 Jul 2013 at 9:55
Original issue reported on code.google.com by
ashwin.j...@gmail.com
on 18 Jun 2011 at 1:00