fonsp / pluto-on-binder

The Unlicense
41 stars 27 forks source link

How to build docker image for faster rendering on binder online? #21

Open yewalenikhil65 opened 1 year ago

yewalenikhil65 commented 1 year ago

It seems that github repository with Pluto notebooks take longer time to build on binder because it builds a docker image everytime we click the binder link.

Would it be possible to create docker image itself of the julia environment we are working with Pluto notebook, and push it to the github repository to host it online for faster rendering??

If yes, can you help me how to create docker image that is binder-pluto friendly??

pankgeorg commented 1 year ago

Yes!

We've tried a lot of different approaches, but most of them didn't worth the effort.

Some context: Binder has a tool to convert a repository to a docker container with Jupyterlab and then to run that. It's called repo2docker and it basically looks for Dockerfile, requirements.txt, Manifest.toml/Package.toml, and, depending on what of these it finds, it installs everything and uses the resulting docker container.

Binder also caches the docker containers, so if you get a cache hit, none of these matters. But on to our adventure!

The most binder-friendly approach might seem to create a repo with a single Dockerfile, doing FROM mycontainerregistry:myimage, and that's it. That sounds cool, but in practice, your image would be around 2GB and that takes some time to build too.

This PR does that: https://github.com/fonsp/pluto-on-jupyterlab/pull/1/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557

It starts with a jupyter-ready image (what binder can run) and adds julia and Pluto on top. Note that it's quite old! Julia is at 1.9 now!!

The workflow of that PR uploads the docker container it builts in ghcr.io (docs about that)

So you end up with something that looks like this: https://github.com/JuliaPluto/docker-stacks/pkgs/container/pluto-jupyter

then, in the above example, you make a github repo with 1 single Dockerfile (and no other file would be needed) that does

FROM ghcr.io/juliapluto/pluto-jupyter:latest

(Disclaimer! Don't use this specific image! it's old! unmaintained! make a new one!)

And that's it! zero build time, all download time.

That didn't work great in practice. You can experiment with various variants of this (and rerun the experiment as any part of the pipeline that was slow, may now be fast!)

Hope that helps, let me know if something is not clear and/or you have more questions!

fonsp commented 1 year ago

If you find yourself trying out many binder experiments by going to mybinder.org/gh/.... in a new tab and timing it with your phone's stopwatch, and then realising you lost the logs... We have some tips to improve this workflow! Email me if you're interested!