First pass at zenodo registry queries

cboettig / contentid

:package: R package for working with Content Identifiers

http://cboettig.github.io/contentid

Other

46 stars 2 forks source link

First pass at zenodo registry queries #67

Closed cboettig closed 3 years ago

cboettig commented 3 years ago

Also begin to refactor pluggable interface

jhpoelen commented 3 years ago

@cboettig wow! I am impressed by the ease at which you turned Zenodo into a content-based repository via contentid package. Now all we need is a network of hash indexers. . .

jhpoelen commented 3 years ago

in which the hash indexes can help to provide easy and fast access to existing content-based repositories and the locations of the data they serve.

cboettig commented 3 years ago

Thanks @jhpoelen for solving the key part of the riddle here! I couldn't figure out that _file.checksum syntax. At least the format should be easily extensible to sha256, even though it only has md5 hashes computed so far. Now if only we can convince Zenodo to compute those sha256 sums...

jhpoelen commented 3 years ago

@cboettig I already offered to make an apple pie and take pictures of Pelicans at Lake Merritt. What more can you do to help persuade @slint et al. to calculate millions (billions?) of sha256 hashes?

cboettig commented 3 years ago

Can't really compete with apple pie.

I'm just curious if the constraints are on the computational cost or the software development/maintenance side. Apparently md5 is not much faster (can even be slower!?) on new CPUs which often have hardware acceleration for SHA hashes. On the other hand, I could imagine no one wants to fix what ain't broken, changing a codebase, testing those changes, updating documentation can all be reason enough not to want to change your checksum algorithm. I'm not sure how helpful we could be on that front, but maybe I could manage a pull request? (Or more ambitiously, maybe could pursue some financial resources to catalyze that?)