cldf / csvw

CSV on the web
Apache License 2.0
37 stars 6 forks source link

Support loading metadata declaring a single table at root level #22

Closed alexshpilkin closed 5 years ago

alexshpilkin commented 5 years ago

The CSVW primer specifically mentions working with metadata that only refers to a single table, by specifying tableSchema and url properties at the root level of the metadata file. Right now it is only possible to load such metadata as a TableGroup, which then has an empty tables list. It would be nice to be able to obtain a Table object from this.

alexshpilkin commented 5 years ago

If you can outline a rough implementation plan, I’d be open to doing a PR.

xrotwang commented 5 years ago

Yes, that what was one of the current simplifications / limitations of the package. (Another one being the only half-baked support for inherited properties or the missing support for shared table schemas.)

I'm not against adding support for your use case, though, but how exactly would you want this to work? Currently, TableGroup has all the methods for reading and writing metadata - so the simplest way of adding support for your use case might be to just populate the tables property with one table when tableSchema and url are encountered at the root level? This would mean that writing the metadata will result in something different than what was read, though.

The alternative would be adding all the I/O methods to Table as well.

Any preference here?

alexshpilkin commented 5 years ago

Well, I think we want roundtripping—I don’t see anything (?) in the normalization section of the spec to suggest that creating a tables property where there was none is OK. So I’d say that Table should support a subset of TableGroup’s methods including from_file, and TableGroup.from_file should in turn refuse to load something that is not, in fact, a table group. (This significantly changes what belongs in TableLike.) But I’m not sure how this would interact with the whole foreign keys business—should Tables with and without foreign keys behave differently, or should we ignore it?

xrotwang commented 5 years ago

From a quick glance, I'd say from_file, to_file and base should go into TableLike and then we're almost good to go. We currently support only local foreign keys (i.e. ones specified with reference.resource) - so we could simply ignore foreign keys (i.e. just read and write but not enforce) in the single table case.

xrotwang commented 5 years ago

I'll add a couple of test cases and then see whether my hunch checks out. Let me know if you have any particular cases you'd like to see in tests.

xrotwang commented 5 years ago

@alexshpilkin please review https://github.com/cldf/csvw/pull/23 - as far as I can tell, it should do what you want.

xrotwang commented 5 years ago

What's slightly annoying about this solution: Table now has a method to_file to write the metadata and a method write to write the data file, whereas TableGroup has a method write to write all tables and the metadata. Unfortunately, there's no way around this which wouldn't break backwards compatibility.

alexshpilkin commented 5 years ago

@xrotwang That was quick. Thanks, I think it does! Though TableGroup can still load the metadata (mostly uselessly) as well.

By the way: is it intended that iterdicts() ignores the names and titles and uses the CSV header instead?

xrotwang commented 5 years ago

Ah, true, I'll make TableGroup.from_file raise an error when no tables key is found in the metadata.

xrotwang commented 5 years ago

Regarding iterdicts: No, that behaviour would be a bug. It's supposed to yield dicts where the keys are name properties from the metadata.

alexshpilkin commented 5 years ago

@xrotwang See #24 re: names. This bug can be closed once #23 is merged, I think.

xrotwang commented 5 years ago

@alexshpilkin let me know if you need a released version of the package with this functionality. If so, I'd cut release 1.6 now, otherwise I'd let some more functionality aggregate.