datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
63 stars 26 forks source link

[Nepal] remove duplicate rows (issue #85) #90

Closed augusto-herrmann closed 6 years ago

augusto-herrmann commented 6 years ago

Removing duplicated rows detected by goodtables.

Now the file validates.

$ goodtables data/np.csv 
DATASET
=======
{'error-count': 0,
 'preset': 'nested',
 'table-count': 1,
 'time': 0.013,
 'valid': True}

TABLE [1]
=========
{'encoding': 'utf-8',
 'error-count': 0,
 'format': 'csv',
 'headers': ['id',
             'name',
             'abbreviation',
             'other_names',
             'description',
             'classification',
             'parent_id',
             'founding_date',
             'dissolution_date',
             'image',
             'url',
             'jurisdiction_code',
             'email',
             'address',
             'contact',
             'tags',
             'source_url'],
 'row-count': 141,
 'scheme': 'file',
 'source': 'data/np.csv',
 'time': 0.011,
 'valid': True}

I don't see any reason to keep duplicated rows in the dataset. If they are different entities in any way, there should be a column to make the difference explicit instead of keeping two records that are exactly the same.