Here are some notes that hopefully --- through some more discussion --- will turn into an overview of coming changes.
Special thanks to @oeway and @k-dominik for discussions so far.
Shortcomings of current system
Unclear source of truth --- descriptions on zenodo are patched in this GH repo
Users may suffer from rate limits set out of our contorl (by Zenodo or GitHub)
Lengthy loop of (1) proposing new descriptions (currently upload to zenodo), (2) testing them on our side, and updating the proposal (1) resulting in unusable versions.
Currently ruled out, potential ways to address shortcomings
don't patch
con: published Zenodo records are often in need of patching
cache to an S3 storage under our control
con: with multiple sources of truth and descriptions contributed by partners directly via GH, keeping a valid cache is challenging and itself relies on access to GH/Zenodo
Use Zenodo sandbox for description proposals
con: may disappear if proposal proceeds too slowly, storage not under our control
The currently most promising way to address shortcomings
S3 first approach:
Proposals get bioimagieo internal id right away, once they are accepted we publish them on zenodo and add the concept doi and version doi [^1](maybe we make the version field mandatory from now on?, so semantiv versions can be mapped to dois?).
description updates get a bioimageio internal id right away (maybe 'update-' + their id?), once the update is accepted we publish it on zenodo and get a new version doi.
The S3 first approach makes sure that we are in control of any rate limits
S3 first approach allows for immediate evaluation of user uploads
Cons of "S3 first"
not free
renders Zenodo's download statistics for bioimageio descriptions meaningless (we need our own solution, @oeway proposed a light-weight proxy service that can keep track of accesses; alternativley, maybe there are even some "built-in" mechanisms for this? something in the direction of https://docs.aws.amazon.com/AmazonS3/latest/userguide/aws-usage-report.html)
Still unclear (to me) about "S3 first"
replacement/update of the current resource description review process including the generated PR that serves as a space to have a chat between contributor and bioimageio maintainers.
Maybe we can use https://gitter.im/ ? Apparently there is a matrix.org based API, so we could create a channel for each resource description.
looking into gitter brings me to our AI4Life matrix ...
Details in need of further discussion/thought
Use of S3 object redirects to realize concept of resource pointing the latest version (alternative is of course simple duplication)
Here are some notes that hopefully --- through some more discussion --- will turn into an overview of coming changes. Special thanks to @oeway and @k-dominik for discussions so far.
Shortcomings of current system
Currently ruled out, potential ways to address shortcomings
The currently most promising way to address shortcomings
Cons of "S3 first"
Still unclear (to me) about "S3 first"
Details in need of further discussion/thought
[^1]: note that one can reserve a DOI and then, e.g. include it in files in that record, see "Can I know the DOI of my record before publishing, so that I can include it in the paper or dataset?"