Closed khusmann closed 1 week ago
Would be useful, but it is a bit more complex than:
datapackage.json
datapackage.json
with file.path()
datapackage.yml
and datapackage.yaml
files are also valid, so we need to check if the provided file path has this. If not, we'll have to assume a datapackage.json
file, potentially missing a datapackage.yml
file. Maybe we could check for those files as well before reporting an error.
I think that is why I initially chose the verbose approach, especially since tab completion when writing the path immediately provides feedback on whether a file is present.
I'm curious to see how other frictionless software tackles this.
Ah, makes sense! I was wondering about that yaml stuff -- I saw yaml export functions in frictionless-py but so far have not seen a datapackage.yaml in the wild, so I was running with the assumption that the datapackage.json was the defacto standard.
The collections in the datahub were initially confusing to me to get working with frictionless-r
because there was no direct link to the datapackage.json in their file listing (here, for example). The default behavior of their data-cli tool points to the root URL of the package though, and I found adding /datapackage.json did the trick.
I think it's nice (for new users especially) to be able to treat the datapackage as a sort of opaque blob they can load resources from (like tabs in an excel file), without needing to think about the internal structure -- it also facilitates distributing packages as self-contained zip files.
@khusmann Thanks for investigating. datapackage.json
is the only valid format according to the specs, but yml/yaml
is supported by frictionless-py and it was requested and implemented as a feature for frictionless-r. I think for guessing a file, it's fine to follow frictionless-py (and the specs) and only look for a datapackage.json
. I'll try to get your PR included in the next version.
Update:
datapackage.json
(that name and that extension) is still the standard in v2 for published packages (internal systems can use different names and formats, which is why frictionless-r also supports reading from a provided datapackage.yaml
). I would therefore not implement functionality that starts looking for yaml if json cannot be found. frictionless-py doesn't either.file.info()$isdir
), but what are the expectations for remote directories (like example.com/package
)? As @khusmann points out that URL could be configured to serve the file or the user might expect the function to look at example.com/package/datapackage.json
. That is 1) two calls and 2) making a call the user didn't request.read_package(file)
. I'm not against supporting a path to a zip file, which is 1) a file and 2) aligns more with the concept of a "sort of opaque blob they can load resources from". See #193.Closing this and associated #158.
A lot of frictionless implementations automatically load datapackage.json when reading a directory. It'd be nice to go
instead of always