datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
63 stars 26 forks source link

[Germany] invalid e-mail addresses in the data #95

Closed augusto-herrmann closed 4 years ago

augusto-herrmann commented 6 years ago

Goodtables detects some problems in the data for Germany in data/de.csv:

$ goodtables --schema public-body-schema.json data/de.csv
DATASET
=======
{'error-count': 5,
 'preset': 'nested',
 'table-count': 1,
 'time': 0.371,
 'valid': False}

TABLE [1]
=========
{'encoding': 'utf-8',
 'error-count': 5,
 'format': 'csv',
 'headers': ['id',
             'name',
             'abbreviation',
             'other_names',
             'description',
             'classification',
             'parent_id',
             'founding_date',
             'dissolution_date',
             'image',
             'url',
             'jurisdiction_code',
             'email',
             'address',
             'contact',
             'tags',
             'source_url'],
 'row-count': 1005,
 'schema': 'table-schema',
 'scheme': 'file',
 'source': 'data/de.csv',
 'time': 0.369,
 'valid': False}
---------
[352,13] [type-or-format-error] The value "info@dw-world" in row 352 and column 13 is not type "string" and format "email"
[367,13] [type-or-format-error] The value "info@landkreistag" in row 367 and column 13 is not type "string" and format "email"
[603,13] [type-or-format-error] The value "trabold@ids-mannheim" in row 603 and column 13 is not type "string" and format "email"
[776,11] [type-or-format-error] The value "url" in row 776 and column 11 is not type "string" and format "uri"
[776,13] [type-or-format-error] The value "email" in row 776 and column 13 is not type "string" and format "email"

For lines 352, 367 and 603, it looks like it might be missing a ".de" suffix. Especially considering that, in all those cases, the uri field contains a domain that would match the domain of the e-mail if we added a ".de" TLD.

Line 776 seem to be just an error in submission or something. I suggest to just clear the data and leave these fields blank.

@rufuspollock, you contributed this file. Are you ok with the proposed fixes?

augusto-herrmann commented 6 years ago

Actually, line 776 looks like this:

de/name,name,,,description,classification,,,,,url,DE,email,contact,address,keywords,

It seems it was just a header that was incorrectly put in the middle of the file. Perhaps as a result of concatenating two different CSVs. I propose just to just delete the line.

augusto-herrmann commented 4 years ago

Issue fixed!