Closed jschultze closed 7 months ago
@jschultze Has source 4 had dedup = true
at some point? What the docs fail to explain properly is that if you turn dedup on or off, you need to run renormalize on the source to update the dedup keys. Sources that have dedup disabled shouldn't have dedup keys, so the records should not be found in deduplication. Regardless, the check here makes sense, but I just wanted to get to the bottom of the issue.
(Wiki updated with a note to run renormalize)
@EreMaijala Thanks for the explanation! Yes, I think that source 4 had the dedup flag set to true
at first and I have not run the renormalize
-command, so that is probably the reason.
I will execute the renormalization to clean the database.
Oops, there's a style problem. Can you fix that too?
The whitespace is removed.
We experienced the following behaviour with deduplication:
Sources configured:
When running deduplication (with or without explicitly stating the sources to be deduplicated with
--source
), records from sourced 1 to 3 where not only deduplicated within this group, but also against source 4. We where expecting only the records from sources that are configured for deduplication to be deduplicated.The RecordManager seems to get candidates for deduplication from the whole database. The additional code checks if the source of a deduplication candidate is configured for deduplication.