datenanfragen / data

The data behind the Datenanfragen.de project. This contains a directory of contact information and privacy-related data on companies under the scope of the EU GDPR, a directory of supervisory authorities for privacy concerns, a collection of templates for GDPR requests and a list of suggested companies to send access requests to.
https://www.datarequests.org/company
Creative Commons Zero v1.0 Universal
106 stars 90 forks source link

Record overlap: `rewe-shop`, `rewe-group-com` #2150

Open mal-tee opened 1 year ago

mal-tee commented 1 year ago

Both have "Rewe Markt GmbH" in the runs-Array. Seems like a mistake we should resolve?

WebworkrNet commented 1 year ago

Thank you for opening this issue (based on my email).

mal-tee commented 1 year ago

Should we turn this into a test? @baltpeter

baltpeter commented 1 year ago

I haven't looked into that particular case yet. Are we sure that that is a mistake?

But, either way, we can't generally forbid two records having identical runs entries. There are already valid records where that is the case, e.g. the Amazon records for different companies:

https://github.com/datenanfragen/data/blob/master/companies/amazon-de.json https://github.com/datenanfragen/data/blob/master/companies/amazon-es.json

mal-tee commented 1 year ago

I haven't looked into that particular case yet. Are we sure that that is a mistake?

Haven't looked either. :sweat_smile:

But, either way, we can't generally forbid two records having identical runs entries. There are already valid records where that is the case, e.g. the Amazon records for different companies:

master/companies/amazon-de.json master/companies/amazon-es.json

Yeah, we should only do that test if there is no overlap in the countries. :thinking:

baltpeter commented 1 year ago

Yeah, we should only do that test if there is no overlap in the countries. thinking

If there is overlap in the countries, you mean, right?

But even then, I'm not sure whether there can never be a case where that is valid…

mal-tee commented 1 year ago

If there is overlap in the countries, you mean, right?

Yes, oops.

I wrote a little script to implement this:

from collections import defaultdict
import os
import json

hashmap = defaultdict(list)

for file in os.listdir("companies/"):
    with open("companies/" + file, "r") as f:
        company = json.load(f)
        slug = company["slug"]
        hashmap[company["name"]].append(slug)
        if "runs" in company:
            for run in company["runs"]:
                hashmap[run].append(slug)

simple_overlap = {k: v for k, v in hashmap.items() if len(v) > 1}
print("simple", len(simple_overlap.keys()))
for name, slugs in simple_overlap.items():
    used_rvs = defaultdict(list)
    alls = set()
    for slug in slugs:
        with open("companies/" + slug + ".json", "r") as f:
            company = json.load(f)
            if "relevant-countries" in company:
                if company["relevant-countries"] == ["all"]:
                    alls.add(name)
                else:
                    for rv in company["relevant-countries"]:
                        used_rvs[rv].append(slug)
    filtered_overlap = {k: v for k,v in used_rvs.items() if len(v) > 2 or name in alls}
    if(filtered_overlap):
        print(name, filtered_overlap, alls)
simple 38
REWE Markt GmbH {'de': ['rewe-shop']} {'REWE Markt GmbH'}
Ideawise Limited {'de': ['gay-de', 'fetisch-de', 'poppen-de', 'kaufmich-com']} set()
Seven.One Entertainment Group GmbH {'de': ['sat1gold', 'prosieben', 'kabeleinsdoku', 'kabeleins']} set()
cpx online active AG {'de': ['optivel'], 'ch': ['optivel'], 'fr': ['optivel'], 'at': ['optivel']} {'cpx online active AG'}
Ingenico Payment Services GmbH {'de': ['ingenico-de']} {'Ingenico Payment Services GmbH'}
Ingenico Healthcare GmbH {'de': ['ingenico-de']} {'Ingenico Healthcare GmbH'}
  1. the initial case for this issue. Seems legit, since the websites are different.
  2. websites are different.
  3. same
  4. ...

Yeah, we'd also have to check if the websites are different. And probably every other key as well.


However, we can close this issue: The rewe group collision is okay, since the webpages are different.

WebworkrNet commented 1 year ago

I see my original concern as unresolved. The database currently shows 2 officials for REWE Markt GmbH:

As I understand it, this cannot be the case, as the unambiguity is missing. Which sources indicate that REWE Zentralfinanz eG is also responsible for REWE Markt GmbH? I have not been able to verify this so far.