cldf / pycldf

python package to read and write CLDF datasets
https://cldf.clld.org
Apache License 2.0
15 stars 7 forks source link

Should duplicate column names pass? #113

Closed Anaphory closed 4 years ago

Anaphory commented 4 years ago

I just had a database conversion fail because my metadata listed a column name twice in the same table. cldf validate did not pick it up – is there ever a reason that this is sensible? It does not make much sense to me, should we bake it into the validator to complain about different columns (and also about identical columns) with the same name?

Testing the behaviour a step further, I gave the dataset two http://cldf.clld.org/v1.0/terms.rdf#form properties, with different column names. That was also accepted by the validator, and it definitely shouldn't.

Apart from pycldf, is there any other place in the specs where we would need to reflect that this is not permitted?

xrotwang commented 4 years ago

The spec does state

Thus, each property can be used only once per table, which makes processing simpler.

(see https://github.com/cldf/cldf#cldf-data-files )

But yes, having CLDF validation catch these issues would make sense.

xrotwang commented 4 years ago

To clarify: The case where database creation failed would have been when the same-name columns did not have different propertyUrls, right? Otherwise, I'd expect schema creation to work - but insertion to still be buggy, because presumably only one value for the column name will make it into the row dict.

Anaphory commented 4 years ago

My columns were non-standard, so they had no propertyUrl at all.

xrotwang commented 4 years ago

Ok, should be all good now - i.e. cldf validate will complain both scenarios.