Hi all!
I've stumbled upon dedupe.io, and I find the library pretty amazing, it's exactly what I was looking for!
I can see in the examples there is one describing how to use MySQL to free the memory and don't process millions of records directly in-memory, and that's what I'm aiming for.
I can see that all the processing stuff is done with the CURSORS of mysql package... but now I have an issue: I don't have a SQL database in my structure, and all my profiles right now are stored in an ElasticSearch database.
I don't want to keep a MySQL instance in-sync with ES, so I was asking myself if everything can be done with ES, and its SCROLL functionality instead of cursors... Is it viable? Will the execution be equally efficient?
Hi all! I've stumbled upon dedupe.io, and I find the library pretty amazing, it's exactly what I was looking for! I can see in the examples there is one describing how to use MySQL to free the memory and don't process millions of records directly in-memory, and that's what I'm aiming for. I can see that all the processing stuff is done with the CURSORS of mysql package... but now I have an issue: I don't have a SQL database in my structure, and all my profiles right now are stored in an ElasticSearch database. I don't want to keep a MySQL instance in-sync with ES, so I was asking myself if everything can be done with ES, and its SCROLL functionality instead of cursors... Is it viable? Will the execution be equally efficient?
thank you, Giacomo.