CODECHECK infrastructure

nuest commented 4 years ago

The assistant is a first step in streamlining code checks. A further step would be online infrastructure that codecheckers can use. Let's note ideas here what this infrastructure could do, what benefits are, what limitations exists, etc.

nuest commented 4 years ago

(+) remote processing > no need to run locally or download files
(-) makes things more complicated or even impossible for protected data and huge datasets
(+) streamlining metadata
- ORCID integration: ORCIDs also has "reviews" now, so we could become a "trusted organisation" and deposit the codechecks directly into the ORCID records of codecheckers, see https://support.orcid.org/hc/en-us/articles/360006971333-Peer-Review (try out with ORCID sandbox member API)
- Zenodo metadata points to article
A Codecheck BinderHub could limit supported repos to https://github.com/codecheckers org

nuest commented 4 years ago

Ideas for Integration with a BinderHub

A BinderHub could allow codecheckers to do everything online:

clone the repo into the codecheck org
edit it on CodeCheckHub (whitelist via ORCID login), which is allowed to push to the repo
trigger a publication of the bundle on Zenodo (on behalf of the user or with a code check user?)

Running assisstant and automated processes:

Maybe we can call a docker exec via a new API endpoint in BinderHub? The exec would execute a binary (to be implemented) that reads a codecheck.yml (finds the files to check), does the check, and then saves the result (with a "signature") back to the codecheck.yml (adding to potentially existing checks). We would manipulate the user container/pod from the outside... and since we would not tell the user about the pod, they also cannot manipulate it. But it would be exactly the same environment they have (i.e. a JupyterHub session/container) that they get when they start the binder for interactive use. We would run the analysis actually in a pod, and not hacked into the r2d build process. Since BinderHub started the JupyterHub, we (should) know the pod and container, so we can also save the container (hopefully) to a file.

[ ] papermill allows to execute notebooks via a CLI (and analogously possible with rmarkdown in R)
[ ] We need a small example (e.g. a Python script that generates random numbers based on a seed and a preliminary "check").
[ ] Discuss approach and questions with BinderHub team (member)

Questions

How does a user "explore" the result of a check?

Do we need JupyterHub? https://github.com/pangeo-data/pangeo-stacks/pull/10 adds a verify script which now is called as part of build.py, but that's more to verify the image works OK, not to actually run the analysis... BUT docker run -i -t ${IMAGE_NAME} "binder/verify" seems like something that we also want to run.

Can we assume a check happens automatically after a binder is started, or is it triggered (potentially allowing manipulation of content) manually? > Should start right after successful build and launch of the pod/container (?) by JupyterHub ?!

Do we need a special UI? > Instead of getting the notebook, you just get a UI that runs the code configured in codecheck.yml within the container, then triggers the check, and puts the "Docker image + check result + badge" somewhere safe.

Alternative: enhance with Jupyter extension, see ideas/discussion at https://github.com/jupyterhub/binderhub/issues/579 and https://github.com/jupyterhub/binderhub/issues/674

Can run commands in a pod's container with kubectl exec
- nbtoolbelt can help running notebooks from CLI: https://gitlab.tue.nl/jupyter-projects/nbtoolbelt/, or nbless, or nbscript
Might JupyterHub be the better component to build CODECHECK on??
Can we run a CODECHECKHub instead of JupyterHub, that just starts a pod, runs the image, executes the analysis, executes codecheck (via kubectl exec) - Would BinderHub maintainers welcome this?
- CODECHECKHub would be like a read-only JupyterHub? Or just a minimal UI that say "running" and then "check result"?
- We likely need a BinderHub-level component that can save the container to a file after the check.

Does BinderHub keep a list of redirects it makes for users to JupyterHub (so it could execute stuff in user containers) ?

No.

How can we export the image to a file and make it available for download? Would that work via a new BinderHub API endpoint?

Is it feasible to run both binder and a binder-fork in the same k8s cluster, using the same Jupyter Hub?

JupyterHub API: https://jupyterhub.readthedocs.io/en/stable/api/ Turing-Way book chapter about BinderHub: https://github.com/alan-turing-institute/the-turing-way/pull/557/files?short_path=bfcf303#diff-bfcf303fc9ba83d09c678a89644c2565

We should try to extend this figure to make CODECHECK's process clear, from https://binderhub.readthedocs.io/en/latest/overview.html

nuest commented 4 years ago

We could also have a bot to streamline organisation of the review process. JOSS's whedon or buffy could be a basis: https://github.com/openjournals/buffy

codecheckers / discussion

CODECHECK infrastructure #2

Ideas for Integration with a BinderHub

Questions