datatonic / duke

Automatically exported from code.google.com/p/duke
0 stars 0 forks source link

Support for deduplicating data directly from Lucene/Solr/ElasticSearch #131

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
A simple data source might be enough. Need to consider this a little.

Original issue reported on code.google.com by lar...@gmail.com on 23 Aug 2013 at 10:02

GoogleCodeExporter commented 8 years ago
It would be great feature , the deduplication functionality can be integrated 
with Apache Solr and works as yet another REST API.

Original comment by gjpv...@gmail.com on 25 Sep 2013 at 12:37

GoogleCodeExporter commented 8 years ago
For ElasticSearch there is actually a module for this: 
https://github.com/YannBrrd/elasticsearch-entity-resolution

It might be an idea to make something similar for Solr. Or one could approach 
it some other way.

Original comment by lar...@gmail.com on 25 Sep 2013 at 6:22

GoogleCodeExporter commented 8 years ago
Any update on Solr data deplication

Original comment by chandras...@vinculumgroup.com on 1 May 2015 at 10:55

GoogleCodeExporter commented 8 years ago
I'm afraid nobody's done this for Solr yet, sorry. All contributions welcome. 
Note that the project has moved to http://github.com/larsga/Duke.

Original comment by lar...@gmail.com on 1 May 2015 at 10:57