Intevation / intelmq-certbund-contact

IntelMQ expert bots to lookup contact information in a database (part of the intelmq-cb-mailgen solution).
GNU Affero General Public License v3.0
3 stars 2 forks source link

certbund-contact: ripe importer several contact entries for the same email address #3

Open bernhardreiter opened 7 years ago

bernhardreiter commented 7 years ago

The following query shows that our ripe_importer creates entries on the contact_automatic table that have the same email address and only differ in the id. How should these be dealt with?

select c, count(c), email from (
    select count(*) as c, co.email as email from contact_automatic as co 
      JOIN role_automatic AS r 
        ON co.id = r.contact_id
      GROUP BY co.email
  ) AS foo 
  GROUP BY c, email ORDER BY c DESC;

Here the distribution (without giving out the email addresses)

c  | count 
----+-------
 36 |     1
 16 |     1
 15 |     1
 14 |     1
 13 |     1
 12 |     1
 10 |     1
  9 |     1
  8 |     1
  7 |     1
  5 |     1
  4 |    11
  3 |     7
  2 |    37
  1 |  1574

Queries ran on a database 2017-01-23 imported for DE like outlined in https://github.com/Intevation/intelmq/blob/473fed97ca323ba91126edd0fc208711613ffac4/intelmq/bots/experts/certbund_contact/README-ripe-import.md

bernhardreiter commented 7 years ago

The problem arises because we do not save all information of the ripe.db.role.gz entry into the database. The question is: Should we search an link an existing entry when importing?