datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
595 stars 180 forks source link

Please let me know if there is any functionality (or create one) to deduplicate source or target records based on variable fields #1940

Closed ruben-jardim closed 1 year ago

ruben-jardim commented 1 year ago

The closest thing I found was the Grouper functionality, but it only allows me select 1 field as the group key and unfortunately some cases require multiple fields as group keys or even simple conditions to be applied (e.g. deduplicate based on ID numbers but select only the ones with the lates Creation Date.

kaspersorensen commented 1 year ago

This sounds like the merge component which is part of DataCleaner Professional (or at least it was when I worked at Human Inference).