frictionlessdata / frictionless-r

R package to read and write Frictionless Data Packages
https://docs.ropensci.org/frictionless/
Other
27 stars 11 forks source link

Automatically load "datapackage.json" when supplied a path to a folder #157

Closed khusmann closed 1 week ago

khusmann commented 11 months ago

A lot of frictionless implementations automatically load datapackage.json when reading a directory. It'd be nice to go

read_package("/path/to/packagedir")

instead of always

read_package("/path/to/packagedir/datapackage.json")
peterdesmet commented 11 months ago

Would be useful, but it is a bit more complex than:

datapackage.yml and datapackage.yaml files are also valid, so we need to check if the provided file path has this. If not, we'll have to assume a datapackage.json file, potentially missing a datapackage.yml file. Maybe we could check for those files as well before reporting an error.

I think that is why I initially chose the verbose approach, especially since tab completion when writing the path immediately provides feedback on whether a file is present.

peterdesmet commented 11 months ago

I'm curious to see how other frictionless software tackles this.

khusmann commented 11 months ago

Ah, makes sense! I was wondering about that yaml stuff -- I saw yaml export functions in frictionless-py but so far have not seen a datapackage.yaml in the wild, so I was running with the assumption that the datapackage.json was the defacto standard.

The collections in the datahub were initially confusing to me to get working with frictionless-r because there was no direct link to the datapackage.json in their file listing (here, for example). The default behavior of their data-cli tool points to the root URL of the package though, and I found adding /datapackage.json did the trick.

I think it's nice (for new users especially) to be able to treat the datapackage as a sort of opaque blob they can load resources from (like tabs in an excel file), without needing to think about the internal structure -- it also facilitates distributing packages as self-contained zip files.

peterdesmet commented 10 months ago

@khusmann Thanks for investigating. datapackage.json is the only valid format according to the specs, but yml/yaml is supported by frictionless-py and it was requested and implemented as a feature for frictionless-r. I think for guessing a file, it's fine to follow frictionless-py (and the specs) and only look for a datapackage.json. I'll try to get your PR included in the next version.

peterdesmet commented 1 week ago

Update:

  1. datapackage.json (that name and that extension) is still the standard in v2 for published packages (internal systems can use different names and formats, which is why frictionless-r also supports reading from a provided datapackage.yaml). I would therefore not implement functionality that starts looking for yaml if json cannot be found. frictionless-py doesn't either.
  2. I prefer not supporting a path to a directory. It's fairly easy to make it work for local directories (with file.info()$isdir), but what are the expectations for remote directories (like example.com/package)? As @khusmann points out that URL could be configured to serve the file or the user might expect the function to look at example.com/package/datapackage.json. That is 1) two calls and 2) making a call the user didn't request.
  3. I would therefore keep the functionality to providing a file, which is in line with the function argument read_package(file). I'm not against supporting a path to a zip file, which is 1) a file and 2) aligns more with the concept of a "sort of opaque blob they can load resources from". See #193.

Closing this and associated #158.