cboettig / contentid

:package: R package for working with Content Identifiers
http://cboettig.github.io/contentid
Other
46 stars 2 forks source link

core functions: query() #14

Closed cboettig closed 4 years ago

cboettig commented 4 years ago
query("hash://sha256/9412325831dab22aeebdd674b6eb53ba6b7bdd04bb99a4dbb21ddff646287e37")
query("http://cdiac.ornl.gov/ftp/trends/co2/vostok.icecore.co2")
  1. Function can take either a URL or a content identifier as input. (Is this really wise? or would it be wiser to have separate functions? e.g. hash-archive.org web api calls those endpoints history and source respectively. Maybe having two separate verbs for the two input types is better than query?)
  2. is the return data.frame structure okay? (data from all registries listed, fields are identifier, source, date. Currently no additional column to indicate which registry an entry is from).
  3. Currently echoes the registries= argument used by register() to support multiple registries, see #12.
  4. Like register, the subroutines query_local() and query_remote() are currently exposed, but I propose removing those since they are redundant and also not clear names.

I think the return structure here is a key issue as well. It definitely makes sense to have a way to see all possible source locations for a given content identifier, rather than just one. But from a user interface perspective, it's often desirable to have a function that just returns one location out per content identifier in, which allows downloads. I thought that difference: returning a table vs returning a single URL, was the key motivation to add retrieve, but judging from #9 we're not on the same page there.

cboettig commented 4 years ago

@jhpoelen still wondering if two different verbs would be better here, one for querying by content uri and a separate verb for querying by source location.

Also, as discussed in #15, return format may need both uuid for the registration event (what SWH calls a 'visit', and (possibly) the content hash to the provenance recorded for the registry. These bits may be hidden (dropped from returned data.frame by default) for simplicity.