Closed Anaphory closed 6 months ago
I think that would open up a big can of worms - e.g. in cases where different row dicts have different un-specified columns. I'm also not a big fan of csv
's behaviour here - possibly leading to csv rows with different number of columns - which makes quite a few csv consuming tools choke.
Huh? I'm not suggesting making this more loose. I'm suggesting to make it stricter. Currently, when the metadata contain a Source
column and the row to be written is {'source': ['forkel2018cross']}
, silently, an empty cell is written and the superfluous dict key is ignored. For example, csv.DictWriter would throw
ValueError: dict contains fields not in fieldnames: 'source'
I think CLDF should do the same, but maybe you have a good reason to not do it. (I have abused this silent dropping of superfluous fields on rare occasions, so it's not entirely without merit, that's why I ask.)
@Anaphory ah, ok. I think now I understand. I'm not sure, I'd want to break backwards-compatibility for this, though. But something like a strict=True
flag for Table.write
would be ok. To make this usable from libraries like pycldf
, these would need to be changed, too, of course.
When writing a table, the writer https://github.com/cldf/csvw/blob/45584ad63ff3002a9b3a8073607c1847c5cbac58/src/csvw/metadata.py#L649-L652 ignores all columns not in the table schema. I have used this a few times, in particular in quick-and-dirty scripts to manipulate CLDF data, but I have also been sometimes confused about outputs (eg. due to typos in the column name), and it is also a deviation from the Python core
csv
behaviour. Is it a conscious choice here, or is this a thing that we should consider changing down the line, preferably before too many other places start relying on it?