autotyp / autotyp-data

AUTOTYP data export
Creative Commons Attribution 4.0 International
38 stars 17 forks source link

Duplicate records #33

Closed xrotwang closed 2 years ago

xrotwang commented 2 years ago

I found 67 records to have duplicates - i.e. multiple identical records in the same dataset. Most are in Categories, in datasets that do not otherwise have multiple records for the same language - so I guess these should just be removed. 20 are in WordDomains. Whether identical duplicates are meaningful there - where other languages have multiple - different - records, I don't know.

tzakharko commented 2 years ago

Duplicate database entries have been removed from Alienability, Gender and NumeralClassifiers in 8ef34905f40d63dc3898c8652444b25722d403fd. Duplicates in WordDomain are a known defect — we are working on an overhaul of that dataset.