Closed Keno closed 5 years ago
In discussion in slack I cam up with the following:
import LibGit2
using Pkg
env = joinpath(@__DIR__, "DemoEnvironment")
isdir(env) || LibGit2.clone("https://gist.github.com/2e4ebf0df689f4409d4341d366c89f15.git", env)
repo = LibGit2.GitRepo(env)
LibGit2.checkout!(repo, "708b17c88a89a88f08f4f1070e04b2a32974b1b7", force = true)
Pkg.activate(env)
pkg"instantiate"
pkg"precompile"
While quite verbose it encapsulates what I want from Manifest integration in Jupyter.
Kristoffer had the idea that we could maybe have something like Pkg.activate("https://gist...", "sha")
that basically activates an anonymous environment and uses the merkle hash of gist repo + sha to cache and identify the environment.
I have wanted a environment publish command before that takes my current environment and uploads it so that I can share it with others for debugging reasons.
I'm glad this led to something that seems generally useful and also not Jupyter-specific.
Let me shamelessly point out that activating a remote repository is what I suggested in the very first post https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-414033742
FYI, we've tagged a release of QuantEcon/InstantiateFromURL.jl
, which implements the idea from @vchuravy above.
To be clear, this first implementation is for a light repo with package and manifest. Which provides a solution for tightly controlled lecture notes /etc. The gist approach, which would be better for less formal setups, could be added as well if anyone is interested
I think https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-425306944 is missing a Pkg.build()
step in order for things to be guaranteed to work starting from a clean slate. Would be nice not to have to do that every time you run the notebook though.
Instantiate builds the packages that got downloaded so don't think that is required.
I seem to recall cases when that didn't happen, but maybe that was just because the build had failed during an earlier instantiate
call.
@tkoolen FYI, the way we avoid rebuilding every time is to either (a) precompile the resources, for git refs that point to moving targets like master
, or (b) version the resources using git tags, so something like activate_github("arnavs/InstantiationTest", tag = "v0.1.0")
will never be updated.
Perhaps there's a very simple solution to this problem: treat the desired embedded environment metadata as code in the first executable cell. The question then becomes how to make it unobtrusive in the standard jupyter UI. It appears the UI doesn't do line wrapping, so there might also be a simple answer to that as well: base64 encode the toml files into a single line each.
The nice thing about this is that it's a solution for scripts which need to "come with their environment" just as much as jupyter notebooks. Then we'd just need a package ProjectEnvironments
(or something) with a very simple and forward/backward compatible API which people could add manually, and which acts as the springboard into the well defined environment for the notebook.
Would this work or have I missed something?
I tried implementing this; there's a few goatchas but it looks like it will work. Gotchas include:
CodeEnvironments
(my working name for the package) from some default environment. Its API would need to be very forward and backward compatible.Generally there seems to be some impedance mismatch with Pkg
, which is probably not a surprise given that I don't know a lot about Pkg
;-) It does, however, offer a way to have per-notebook embedded manifests and project files.
Another datapoint: I'd like to be able to send people links to colab notebooks with in-built environments, but the unit they use is a file :)
I think my proposed solution/workaround would be ok for that. Would you be interested in it becoming a registered package? I'd need to think a bit more about the workflow and API, and probably involve Pkg people to know whether it's going to work out, or is fundamentally broken in some way. But I'm not sure whether to do that extra work yet.
@Keno @c42f If you have been using a solution like this, there is another use case to consider: getting notebook users able to update the project and manifest when necessary. This has proven to be very important for our set of lecture notes, otherwise people effectively start copying around notebooks and using copies of them to edit for assignments.
After doing this for the last 8 months, my gut says that metadata in a notebook could become hellish to maintain and lead to all sorts of user issues wondering why they have the wrong versions of packages. I used to be of the opinion that hidden metadata was the right way to go, but have reversed my stand completely. On the other hand, I will never reverse my stand that notebooks have to execute self-contained from a single file and that copies around toml files is a terrible idea.
For what it is worth, the approach we implemented from people's (i..e @vchuravy 's) suggestion in https://github.com/QuantEcon/InstantiateFromURL.jl/ has been very successful. Basically,
.project
file to see if the version of that package has been downloaded. If not, it downloads, activates, and instantiates. Otherwise it just activates. The instantiation has been a very helpful step for ensuring people are using the right versions of the packages and it makes installation a joke.Take a look at https://github.com/QuantEcon/quantecon-notebooks-jl/blob/master/kalman.ipynb as an example, but basically all that is needed is the
using InstantiateFromURL
activate_github("QuantEcon/QuantEconLecturePackages", tag = "v0.9.6");
at the top of the page. The project and manifest are versioned in https://github.com/QuantEcon/QuantEconLecturePackages
Now, for those who don't need to have a mini repository, @vchuravy had the initial idea that this sort of package could have a simple utility to setup a gist
instead. https://github.com/QuantEcon/InstantiateFromURL.jl/issues/18
We didn't need it ourselves and couldn't put in the development time, but I think it is exactly the sort of thing that is needed for more lightweight package management.
... all of that is to say: before starting on any new solution, please see if the workflow in this package is solid and feel free to submit PRs for new features. If enough people vet this solution, a variation on it might make sense in Pkg.jl or at least a more formally maintained package.
@jlperla That sounds like a great workflow for your use case. My reservation is that it's not self contained and requires supporting infrastructure which can't easily be updated by the end users. This is probably a good feature in your case where you're running a class with homogeneous package requirements.
On the other hand, I'm helping a group of somewhat nontechnical PhD students with heterogeneous data management and analysis tasks. My thought is that I should be able to give them jupyter notebooks (and normal scripts!) which have embedded self-contained environments. I'd also like them to be able to update package requirements as their needs change. But at the same time, have them well defined and embedded within the notebooks so that package requirements are somewhat resistant to user error (eg emailing a script and forgetting to add the Project and Manifest files).
(eg emailing a script and forgetting to add the Project and Manifest files).
Emailing project and manifest files around simply does not work. I am completely with you. And they make put the wrong ones with the wrong files.
My thought is that I should be able to give them jupyter notebooks (and normal scripts!) which have embedded self-contained environments. I'd also like them to be able to update package requirements as their needs change.
I understand the goal of a "self-contained environment", but I would decouple that from a self-contained file. Here are some usage scenarios:
But at the same time, have them well defined and embedded within the notebooks so that package requirements are somewhat resistant to user error
These are the tip of the iceberg.... As I said, I used to think that this stuff belonged in the notebook but changed my tune completely after seeing usage scenarios.
I'd also like them to be able to update package requirements as their needs change.
Having these things centrally managed is extremely helpful. But I understand that having a full repo for the set of project/toml is a little heavy for most uses.
This is exactly why @vchuravy had originally suggested using a gist with some tools (which I will try to summarize below). For us, having a consistent set of versions to bump was very nice but things don't need to have a full and controlled repository.
Basically, I think he had in mind https://github.com/QuantEcon/InstantiateFromURL.jl/issues/18 as a formalization of https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-425306944
gist
on their github account for a given
using InstantiateFromURL
hash = publish_gist(".") # by default, gets the local `project.toml and `manifest.toml` from the local file
# Could optionally pass in the github username, or use the github config to see it.
# e.g. hash = 2e4ebf0df689f4409d4341d366c89f15
using InstantiateFromURL
activate_gist("2e4ebf0df689f4409d4341d366c89f15") # optionally have a tag?
publish_gist(".", hash)
to commit and push changes.... or something along those lines.
I've been using notebooks + toml in gists for a while, and while it works, there are some hassles
1) setting it up is a bit of a paint: you have to create the gist, then clone it back to the directory. Could be addressed by a script (though you would require a GitHub API key), but would be nicer if it could be done via Jupyter itself. Once set up though, pushing updating is easy.
2) all my gists end up being called "simonbyrne/Manifest.toml" (I assume because this is the file that appears first when sorted by ASCII?). GitHub doesn't seem to provide a mechanism to rename them (you can change the comment that appears below, but not the name).
Not sure if this helps, but the InstantiateFromURL package grabs repo tarballs (which don’t require an API key), and we store them (names salted with SHA hash) in a hidden directory from where the script is run.
Could be different on the gist side, though.
Setting it up is a bit of a paint: you have to create the gist, then clone it back to the directory. Could be addressed by a script (though you would require a GitHub API key), but would be nicer if it could be done via Jupyter itself. Once set up though, pushing updating is easy.
I agree, and those sorts of scripts built into a package seem to be what Valentin was getting at. I think it is a perfect case for a light package (which could ultimately become a feature of Pkg3 itself). I am hesitant to say that we should have it in "jupyter" or IJulia since this is a more general problem than just jupyter notebooks.
If anyone wants to work on gist features, @arnavs and I would be happy to merge them into the InstantiateFromURL.jl
as a testbed
These are the tip of the iceberg.... As I said, I used to think that this stuff belonged in the notebook but changed my tune completely after seeing usage scenarios.
These are all good points but come with strong assumptions that:
Consider instead that you are helping a group of nontechnical colleagues (students and lab staff) with their individual projects, each of which has different package requirements. This situation is a very different use case and I don't see how InstantiateFromURL
can help with it.
This situation is a very different use case and I don't see how InstantiateFromURL can help with it.
Hence the suggestion to have gist based workflows from some people with simple publishing tools. We didn't build it because of lack of time and not knowing requirements since we didn't need it.
My points are primarily about the difficulty of having relatively non-technical people manage project and manifests within the jupyter files themselves and all the things that can go wrong.
The other thing to consider is that the students can use a base set of packages and then install additional ones with the by the build commands at the top of their own notebooks.
But I could be wrong... Maybe there is some sort of technology that could make managing embedded package information within a notebook seamless and manageable. But it is hard to imagine without deep integration of both IJulia and Pkg3 (which there seems to be little appetite for).
@c42f FYI IJulia.load_string
seems to be a better option than clipboard
when you are using it in Jupyter notebooks.
- Mutable environments are somewhat problematic; you want the jupyter user to be able to add easily to the environment, but this conflicts with a desire to make them immutable and content addressed for the purposes of activating them from jupyter code.
I thought about how to address it. Here is an idea: put the following code with a hypothetical function use_packages
in a hypothetical package IJuliaPkg
at the top of the notebook:
using IJuliaPkg
use_packages(
[
"Plots",
"DifferentialEquations",
],
)
which adds the packages in a plain environment, encode Project.toml
and Manifest.toml
in base64 or upload it to gist (hereafter I call the Julia object for it $ENCODED_PROJECT
), and then replace the current cell with
using IJuliaPkg
use_packages(
[
"Plots",
"DifferentialEquations",
],
project = $ENCODED_PROJECT,
)
using IJulia.load_string(..., true)
. It should be easy to make use_packages
idempotent; i.e., don't do anything other than instantiate
+activate
when the set of packages to be installed is identical to the one recorded in Project.toml
in $ENCODED_PROJECT
. I think this lets you change the requirements of the notebook as you go. That is to say, if you want to import PyPlot
, go to the top of the notebook and edit it to
using IJuliaPkg
use_packages(
[
"Plots",
"DifferentialEquations",
"PyPlot",
],
project = $ENCODED_PROJECT,
)
and then hit shift+enter which updates $ENCODED_PROJECT
.
@tkf thanks, that's an excellent point. I had just assumed overwriting a code cell from the kernel was impossible! With this in mind I think it's possible to have a self contained solution.
Closed by #820.
Now, that 0.7 is getting closer, it may make sense to start thinking about how notebooks interact with the new package manager. I had discussed with @StefanKarpinski and @KristofferC that it would be great if notebooks could embed a MANIFEST and thus if you send somebody a notebook they could automatically load everything with the correct versions. Doing something like this would require figuring out where to store the information, how to hook it up to Pkg3 and probably require some UI work as well.