Closed Keno closed 5 years ago
maybe with the contents API
I believe that what's needed is a "environment protocol": i.e. instead of needing to actually have a project file and/or manifest file present, or a package directory, or load path array, one just needs to implement the environment protocol. Then the IJulia package can implement the protocol for notebooks that have environment information stored in them and voila, each notebook has its own environment. However, I think that work is a 1.x kind of thing: we now generally understand what the protocol needs to look like; the next step is to factor out the protocol part in such a way that the three kinds of environments that we already support are implementations of this protocol; after that we allow a notebook to implement the environment protocol as well.
The main thing to consider at this point is how to allow for extension in the future. Where is the hook? Do we have a Base.PACKAGE_ENVIRONMENT
variable, which, if set, overrides the LOAD_PATH
lookup? Or do we have some special name which can be put into the LOAD_PATH
that causes loading to talk to the notebook instead?
The contents API seems like it may be a good way to stash the manifest information, but we don't really need something that emulates a file system—using a JSON store would actually be easier.
A long-run solution is great to automatically embed the manifest/etc. But is it possible to have a short-term patch requiring a manual call to load something in the notebook itself? That is, something along the lines of
Pkg.setmanifest("Manifest.toml") #i.e., local to the notebook
using MyLib #i.e., the kernel is using the `Manifest.toml` now
Or maybe this is already possible with some of the Pkg3
commands in Jupyter?
🤷♂️ maybe?
Can't you just activate
the dir of the notebook? Then that notebook will use a separate environment that will be stored next to the notebook.
To make sure I understand this, you think I may just be able to put a Manifest.toml
in the notebook directory, then I should just need to run:
Pkg.activate(".")
using MyLib
If that is correct, I can try to have someone test it when IJulia is sufficiently stable with 0.7
You need a Project file as well. But yes if you do
Pkg.activate(".")
and then go wild with adding packages, those will be recorded in Project.toml
and Manifest.toml
along the notebook, and if you send these files to someone else, they can do
Pkg.activate(".")
Pkg.instantiate()
to install all the packages at the version you used them.
If opening a notebook can instantiate arbitrary Manifest.toml
(which may contain arbitrary repo-url
), isn't it a security hole? Isn't it also incompatible with the security model of Jupyter notebook (= trust if you execute it)?
How about adding a simple function that uploads Project.toml
and Manifest.toml
to gist and then call IJulia.load_string
to inject something like
Pkg.activate("https://gist.github.com/.../...")
to a notebook cell? Of course, Pkg.activate
then has to support downloading *.toml
when a URL is specified. Pkg.activate
can also check if those packages are from the known registries and prompt user if not.
Alternatively, I guess you can use cell attachments to bundle *.toml
into the notebook file but it would require the kernel and the server to be in the same machine. For example, it won't work if you launch a Julia kernel on a HPC cluster via a Jupyter notebook server running on your laptop.
I don't know jupyter all that well, but isn't the security controlled by how it is contained? You can load local files, run shell stuff, etc if it lets you?
Certainly being able to instantiate a local manifest is not the long-run solution, and will not work for all scenarios, but I don't think it is a security hole.
I now have this in my notebooks
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()
pkg"precompile"
Pkg.activate(".")
doesn't work well since you can start your jupyter notebook from any working directory.
Storing the Manifest + Project inside the notebook and have a button that does that would come a long way. There shouldn't be any security problems with that, it is just a convenience layer?
Their security model is:
- Untrusted HTML is always sanitized
- Untrusted Javascript is never executed
- HTML and Javascript in Markdown cells are never trusted
- Outputs generated by the user are trusted
- Any other HTML or Javascript (in Markdown cells, output generated by others) is never trusted
- The central question of trust is “Did the current user do this?”
--- https://jupyter-notebook.readthedocs.io/en/stable/security.html#our-security-model
So I don't think you can register any UI elements like a button to instantiate a project from the notebooks. Thought I guess that's possible via front-end extension.
I just thought using IJulia.load_string
is a very simple and generic solution since it does not require writing any front-end extension. It is also useful outside Jupyter/IJulia.
So I don't think you can register any UI elements like a button to instantiate a project from the notebooks. Thought I guess that's possible via front-end extension.
This is a pretty fundamental feature, so integrating it nicely into the frontend for every julia notebook seems like the right way to do it.
If you are willing to write a front-end extension I think that's great! I have no intention of stopping it.
The notebook does include a certain amount of notebook-wide metadata, detailing the language and kernel. e.g.
"metadata": {
"kernelspec": {
"display_name": "Julia 1.0.0",
"language": "julia",
"name": "julia-1.0"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.0.0"
}
It may be possible to insert and read the manifest information from there.
As far as a security model goes, one solution could be a confirmation dialog before installing any new package versions via activate
.
Well, I asked on the Jupyter gitter: it seems like this is not possible via the current protocol, so if we wanted something along those lines we would need to do it via a jupyter extension.
What if we added a function to IJulia which did something along the lines of what @vchuravy suggested, e.g.
using Pkg
function useproject(path=pwd())
Pkg.activate(path)
Pkg.instantiate()
pkg"precompile"
end
Then, at the top of the notebook you could just do
IJulia.useproject()
It does not work when IJulia kernel and Jupyter server run in different machines. https://bitbucket.org/tdaff/remote_ikernel/src/default/ https://github.com/ipython/ipython/wiki/Cookbook:-Connecting-to-a-remote-kernel-via-ssh
At JupyterCon I spoke with a few Jupyter folks and their take was that trying to put this kind of metadata into notebooks was not the right direction to go—they've tried this with images and other things in the past and have come to feel that the "unit of distribution" should be a git repo, not a single notebook file. So it seems like the way to go here might be to have IJulia automatically activate the project in the git repo that it's in. After all, you are running the code in the notebook, so presumably you trust it. (As compared to just starting a Julia process in a directory, which may or may not mean that you trust the content of the directory enough to execute it.)
IJulia doesn't know what notebook file (if any) it is executing — that information is not provided to the kernel.
At JupyterCon I spoke with a few Jupyter folks and their take was that trying to put this kind of metadata into notebooks was not the right direction to go—they've tried this with images and other things in the past and have come to feel that the "unit of distribution" should be a git repo, not a single notebook file. So it seems like the way to go here might be to have IJulia automatically activate the project in the git repo that it's in. After all, you are running the code in the notebook, so presumably you trust it. (As compared to just starting a Julia process in a directory, which may or may not mean that you trust the content of the directory enough to execute it.)
If we go this way, I'd still like a way to package everything into a single file that you can email to somebody or share on JuliaBox (also have separate environments for every notebook on juliabox). If we I just want to share some code with somebody, I don't think we can expect the workflow to be "Go clone this git repo".
I don't think we can expect the workflow to be "Go clone this git repo".
I agree. Jupyter notebooks need to be able to be used self-contained in some sense. Even the Jupiter interface is often around the "Upload" notebook interface.
What about the ability to activate from a URL? You could give it the project file and/or manifest, and it would enable copying jupyter around. And if someone wanted to run the notebook in whatever global project they had in their current jupyter, they wouldn't need to use those cells?
I'm just reporting what the Jupyter people (@Carreau if I recall correctly) told me which is that they are moving away from trying to make notebooks self-contained because it has not worked out as hoped. The simplest solution would seem to be serving a zip or tar file continaing a set of notebooks, resources used by the notebook and in our case, project and manifest files.
Yes, we tend to try to think of (1 unit == 1 repository).The notebook as unit, espescially since you can now connect many notebook to same kernel make not much sens.
We haven't really figured out how to make all of the completely work, but generally trying to shove more into a notebook does not work.
As said before a repository does not always work, but I don't think we can get a "one size fits all". There is always this tension between being able to manipulate things on the filesystem, and having everything being opaque and managed by Jupyter.
You could of course have an extension for jupyter that show "bundles" as an actual tree of files, but then you can't cd into it.
Maybe something along a fuse driver that expose a single file at some path, and repo structure in another ?
@fperez would be interested in this discussion BTW, and I think we had pictures of a whitebord with all the different axes of what people want from notebook files.
My experience is that embedding data in notebooks is a lost cause. e.g. the attachments feature is basically useless, since:
Is there a reason not to enable on url based project and Manifest files? In a github based implementation, you could point it to the raw file, or a local url. And notebooks copied around would then work.
Does that break the Jupyter security model? Since the user would actively choose to run the script and trust the notebook, it doesn't seem like it?
I often enough ship notebooks to students who work in isolated (supercomputing) environments. So needing internet access to replicate a notebook would be an annoyance.
If we say that the unit is the git repository that is fine, but I really would like to enable a workflow where somebody can just grab a notebook itself.
To me Project/Manifest are very unlike pictures and other attachments. A notebook doesn't break just because a picture is missing, but it won't work without the correct manifest. So embedding that in the Metadata would be preferable.
I recently gave a workshop and it boiled down to having an environment per notebook in a different subfolder.
Why is git relevant to this discussion? Isn't the actual 'unit' just a directory?
Yes, I think you're right @tkoolen, but most of the time that directory is the root of a git repo and git repos do end up being the unit of reproducibility (although git trees actually work too).
@tkf what does "it" refer to here?
It does not work when IJulia kernel and Jupyter server run in different machines.
Yes, I think you're right @tkoolen, but most of the time that directory is the root of a git repo and git repos do end up being the unit of reproducibility (although git trees actually work too).
Yes, and usually it provides "collaboration" capability like synchronisation or something else. It's one way of abusing language, saying that the unit of reproducibility/sharing is bigger than a notebook (and may not contain a notebook).
@simonbyrne IJulia.useproject()
you suggested assumes that the kernel is running on the machine where the notebook file is. This is not true in general since you can connect to remote kernel via ssh (say). But maybe you can argue that it is too exotic usecase so it is not worth supporting.
BTW, my suggestion https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-414033742 does not have these shortcomings you pointed out for the attachment approach:
- you can't access the attachments from the kernel (#625), so it's no use for embedding data.
- attachments are lost by nbconvert (jupyter/nbconvert#699), so you can't even use it for embedding images if you want to, say, convert the notebook to a presentation.
Ah, thanks. That makes sense.
To be wholly self contained (i.e. avoid things like downloading manifests from gists), I think the only real solution is to do what @StefanKarpinski suggested and have an "environment API", so that you could specify something equivalent to the Manifest.toml in the first cell (perhaps in a less verbose format), along with a "snapshot" function that would generate the necessary input.
Manifest.toml has to hold rather big metadata (dependencies of dependencies) so my naive guess is that it's hard to squash it into a small notebook code cell.
self contained (i.e. avoid things like downloading manifests from gists)
Why do you want to avoid network access while you need it to install packages anyway? Also, you can include gist sha to the url. Since git is immutable, it then becomes fully self-contained in the sense that the dependency tree is fully determined by a single notebook file.
The manifest is not only needed for installing packages but also to determine what you can load. Without it, you are blind. So putting that in a url seems like a bad idea.
Alright, so do people think it's actually desirable to have the .toml
files embedded in the notebook at this point? I'd actually argue that: no it's not even desirable even if it is technically possible, because it'd be a very unconventional/magic way of working with Pkg.
Some thoughts:
.toml
files, and to explicitly add import Pkg; Pkg.activate(@__DIR__); Pkg.instantiate()
at the start of the notebook, it would at least be very clear what's going on. File > Download as > Julia (.jl)
result in a .jl
file that when run, always produces the same results as the notebook, no matter the directory to which it is downloaded? If so, how can that be reconciled with the previous point?using IJulia; notebook(dir="/some/path")
from Julia. Perhaps there could be an extra project
kwarg for that to mirror the --project
Julia argument, with default value either dir
, or pwd()
, or Base.active_project()
, passed along to the kernels. But another way is to runjupyter notebook
from the command line, and jupyter
is of course agnostic as to the kernel, so how do you make those two consistent?IJulia
in my default environment, so maybe it needn't be required to list it explicitly in the Manifest.toml
for the notebook directory (thinking about NBInclude.jl here), but I could definitely be persuaded otherwise.)The manifest is not only needed for installing packages but also to determine what you can load. Without it, you are blind.
@KristofferC That's why I suggested https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-414033742:
Pkg.activate can also check if those packages are from the known registries and prompt user if not.
Also, notebook already has using/import statements in it. It's very transparent what are going to be loaded (given that it uses registries you trust).
But another way is to run jupyter notebook from the command line, and jupyter is of course agnostic as to the kernel, so how do you make those two consistent?
@tkoolen I think the default cwd of a jupyter kernel launched by jupyter notebook/lab is the directory of the notebook file.
What's wrong with sending people a zip file of a directory?
@StefanKarpinski It does not work with some exotic usecase: https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-424907156. But I guess not supporting it makes sense since then simple Pkg.activate("."); Pkg.instantiate()
just works.
Sorry if this is an asinine and already rejected suggestion, but what about skipping Jupyter based modifications and rely entirely on the package manager?
That is, users have the option to
] activate .;
and/or ] activate .; instantiate
at the top of the notebook] add MyNotebookProject; activate MyNotebookProject; instantiate
or whatever. They would not do using MyNotebookProject
because there is no code or reexporting. Or if not registried, then ] add https://github.com/myproject/MyNotebookProject.jl
.Is this an abuse of the package manager? Can the registries (and uncurated registries) handle a proliferation of lots of small convenience packages, or would it break things? Of course, this wouldn't really be one-project per notebook. I realize this is inconvenient with the current METADATA
based package registration, but I imagine that infrastructure could change.
@StefanKarpinski, re:
What's wrong with sending people a zip file of a directory?
I think that's the way to go.
@tkf, re:
I think the default cwd of a jupyter kernel launched by jupyter notebook/lab is the directory of the notebook file.
That's true. But regardless of what pwd()
is, an important question is still what Base.active_project()
should be if you:
jupyter notebook
from /some/path
and open a Julia notebook (possibly in a subdirectory or a remote location).using IJulia; notebook(dir="/some/path")
from Julia started in an arbitrary directory and with an arbitrary Base.active_project()
, and open the same notebook.I think that calling Base.active_project()
from the first cell of the notebook should at least return the same directory in both cases, but if we really believe in 'unit == directory', maybe it should be equal to joinpath(@__DIR__, "Project.toml")
for the running notebook instead of the current ~/.julia/environments/v1.0/Project.toml
. Either that or every notebook needs an IJulia.useproject()
as the first cell as in https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-424199315, which has the advantage that it's clear what's going on, but would still be unfortunate boilerplate to have in (almost) every notebook.
Using joinpath(@__DIR__, "Project.toml")
as the default Base.active_project()
for a notebook also addresses https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-424216750 I think. But if this proposed default value is not what you want, you can always use Pkg.activate
in a notebook cell to change it as desired, for exotic use cases.
Create a lightweight package without any code, and probably just a Manifest and Project files. The notebook using the package is then completely self-contained.
@jlperla Yeah that's essentially equivalent to my suggestion https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-414033742. Though I don't think you need to turn it into a package. With the current infrastructure, you can already do:
run(`git clone $URL workspace`)
cd("workspace") # has Project.toml [and Manifest.toml] in it
using Pkg
Pkg.activate(".")
Pkg.instantiate()
@tkoolen I don't think automatically activating an environment works unless it is automatically instantiated. However, automatic instantiation deviates from Jupyter's security model (= you trust the notebook if you ever run it) https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-414096386. You could put using Pkg; Pkg.instantiate()
in the first cell but then single IJulia.useproject()
would do the same and more explicit.
Using
joinpath(@__DIR__, "Project.toml")
as the default Base.active_project() for a notebook also addresses https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-424216750
No, I don't think so, because in the scenario I described, the notebook file and the kernel are running in different machines (e.g., jupyter lab
in your laptop and IJulia kernel on some cloud compute node). pwd()
of the remote IJulia kernel won't reflect the location of local notebook path (whose directory may not exist in the mote machine).
Other than the "exotic" remote_ikernel
usage, I wonder how the approach with *.ipynb
and *.toml
in a directory plays with the realtime collaboration in Jupyter, something like (now deprecated) jupyterlab-google-drive. It looks like the notebooks do not exist as a local JSON file anymore in this case and you can't have *.toml
files besides them https://github.com/jupyterlab/jupyterlab-google-drive/issues/39.
It does not work with some exotic usecase: #673 (comment).
I don't mean loading from a zip file, I mean just sending someone a zip file and then they unzip it. The only real requirement here seems to be being able to send people a single file. I don't see why having a single file on the local filesystem where it's running is required.
just sending someone a zip file and then they unzip it
@StefanKarpinski I'm just saying that this is too simplistic approach to cover other usages in Jupyter notebook/lab. The kernel may be running in a different machine (e.g., remote_ikernel
https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-424907156) and file system may be virtualized (e.g., jupyterlab-google-drive https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-424979926).
It does not seem like Jupyter currently has the features to support this kind of thing. I don't think it should really fall on us to try to work around such limitations. The appropriate path forward seems like it would be conveying what we would need to do what we want to do.
It does not seem like Jupyter currently has the features to support this kind of thing.
Jupyter has set_next_input
protocol (invokable via IJulia.load_string
) to support implementing what I suggested in https://github.com/JuliaLang/IJulia.jl/issues/673#issuecomment-414033742
Now, that 0.7 is getting closer, it may make sense to start thinking about how notebooks interact with the new package manager. I had discussed with @StefanKarpinski and @KristofferC that it would be great if notebooks could embed a MANIFEST and thus if you send somebody a notebook they could automatically load everything with the correct versions. Doing something like this would require figuring out where to store the information, how to hook it up to Pkg3 and probably require some UI work as well.