dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
404 stars 216 forks source link

[QUESTION] iterating with ElasticSearch? #123

Closed cragia closed 2 years ago

cragia commented 3 years ago

Hi all! I've stumbled upon dedupe.io, and I find the library pretty amazing, it's exactly what I was looking for! I can see in the examples there is one describing how to use MySQL to free the memory and don't process millions of records directly in-memory, and that's what I'm aiming for. I can see that all the processing stuff is done with the CURSORS of mysql package... but now I have an issue: I don't have a SQL database in my structure, and all my profiles right now are stored in an ElasticSearch database. I don't want to keep a MySQL instance in-sync with ES, so I was asking myself if everything can be done with ES, and its SCROLL functionality instead of cursors... Is it viable? Will the execution be equally efficient?

thank you, Giacomo.

fgregg commented 2 years ago

good question, but not really relevant to this repo. If you figure this out, would be nice to see a PR with an example