The following query shows that our ripe_importer creates entries on the contact_automatic
table that have the same email address and only differ in the id. How should these be dealt with?
select c, count(c), email from (
select count(*) as c, co.email as email from contact_automatic as co
JOIN role_automatic AS r
ON co.id = r.contact_id
GROUP BY co.email
) AS foo
GROUP BY c, email ORDER BY c DESC;
Here the distribution (without giving out the email addresses)
The problem arises because we do not save all information of the ripe.db.role.gz entry into the database. The question is: Should we search an link an existing entry when importing?
The following query shows that our ripe_importer creates entries on the contact_automatic table that have the same email address and only differ in the id. How should these be dealt with?
Here the distribution (without giving out the email addresses)
Queries ran on a database 2017-01-23 imported for DE like outlined in https://github.com/Intevation/intelmq/blob/473fed97ca323ba91126edd0fc208711613ffac4/intelmq/bots/experts/certbund_contact/README-ripe-import.md