Closed augusto-herrmann closed 6 years ago
Removing duplicated rows detected by goodtables.
Now the file validates.
$ goodtables data/np.csv DATASET ======= {'error-count': 0, 'preset': 'nested', 'table-count': 1, 'time': 0.013, 'valid': True} TABLE [1] ========= {'encoding': 'utf-8', 'error-count': 0, 'format': 'csv', 'headers': ['id', 'name', 'abbreviation', 'other_names', 'description', 'classification', 'parent_id', 'founding_date', 'dissolution_date', 'image', 'url', 'jurisdiction_code', 'email', 'address', 'contact', 'tags', 'source_url'], 'row-count': 141, 'scheme': 'file', 'source': 'data/np.csv', 'time': 0.011, 'valid': True}
I don't see any reason to keep duplicated rows in the dataset. If they are different entities in any way, there should be a column to make the difference explicit instead of keeping two records that are exactly the same.
Removing duplicated rows detected by goodtables.
Now the file validates.
I don't see any reason to keep duplicated rows in the dataset. If they are different entities in any way, there should be a column to make the difference explicit instead of keeping two records that are exactly the same.