fonsp / Pluto.jl

🎈 Simple reactive notebooks for Julia
https://plutojl.org/
MIT License
4.95k stars 285 forks source link

Package environment #142

Closed zenon closed 3 years ago

zenon commented 4 years ago

Hi,

I try and use many different libraries, and only some of them I keep. So, to start a project that uses a certain set of libraries, I enter a new directory, and in Julia I say

] activate "."

And now, I have a clean environment.

Pluto, on the other hand, starts its Notebooks in the standard environment (thus, for me, doesn't find any library).

So I added some cells before my "using XXX" lines:

using Pkg Pkg.activate("myPath")

Side note: This isn't very portable. If there'd be a better way, I'm happy.

When writing the notebook, everything went fine. When reloading it, the ordering by Pluto kicked in, and all packages were loaded up front, and thus not found (or so I interpret what I saw). This reordering seeems sensible to me in all cases except when working with environments :-)

I have no good idea where to put such meta information. Somehow I either

Both makes Pluto more complex, I'm afraid.

As "using XXX" has to be on the top level, I don't see a way to put this into functions, like, I mean

 # not syntactically correct julia 
 function initialize()
    using Pkg
    Pkg.activate("myDir")
end

x = do
   initilize()
   using XXX
   using YYY
end

Any ideas?

Kind greetings, z.

fonsp commented 4 years ago

Hi z,

Very good point! I have been thinking a lot about this lately - I think that every Pluto notebook should start in its own package environment, and there should be a GUI for loading packages, or this can be done with using XXX directly.

The other thing here is that a Pluto notebook should contain that package info! That way a .jl is completely reproducible (as far as packages go). Before going too much into detail - what do you think about this idea?

fonsp commented 4 years ago

Also, you can use

let
    import Pkg
    Pkg.activate(".")
    Pkg.add("XXX")
end

as you first cell, and it should also be the first cell to run. Does that work?

zenon commented 4 years ago

Ah!

Thank you Fons!, I gave up when reading the error message that 'using' must be top level; didn't expect to find that there a re more top levels than I thought. :-)

Yes, a let/end or begin/end block seems to do it. (Assuming that my single experiment showed characteristic behavior.) Thank you!

I'll comment with thoughts on a general solution in the next hours.

Kind greetings, z

zenon commented 4 years ago

Some thoughts.

Creating a directory (to put the environment in), and installing packages is potentially dangerous. I wouldn't do it without asking.

What we can do is check for the right content of the current environment. Pkg.installed() gives a dictionary packageName => versionString.

So, a possible way, not thinking about environments yet: Pluto adds the result of Pkg.installed() in a comment at the end, and, when starting, checks, whether that's given, and warns if not. (And offers to do the install.)

An environment is a directory with configuration files, or at least that's all I know about it currently. Hm. I don't know whether it has the implication, that the packages mentioned in the config files really are installed.

In a way, using environments is the more heavy duty way to use Julia. So I think it better to make that optional for Pluto. When an environment is used, it needs an identifier, and it starts getting difficult.

Maybe I continue above, where I said "offers to do the install". Pluto can ask for a directory, looks whether that's empty / create it when it doesn't exist. If exists & empty, activate it as environment-directory, and start installing.

Or: Just print a note how to create an environment, and what to install there, and how to build it; and at the end ask the user to provide the directory. That would be a start!

Note, one reason why I am reluctant to install: I regularly have difficulties installing Julia packages. Most often because they try to install something non-Julia (Conda/Python, DLLs, all that). I'd refuse to take responsibility about that. (That's the reason why I added the "how to build it" phrase to the paragraph above, as often the building fails.)

Kind greetings, z.

fonsp commented 4 years ago

My thoughts were that a notebook will get a directory in /tmp (gets deleted on boot), which is activated as the environment, i.e. there will be a Manifest.toml in it.

The first time you use using or import, Pluto will ask you whether you want to use an existing package environment - you can choose its path - or whether to make the notebook self-contained.

In the existing Pkg environment case, it is exactly like it is today.

A self-contained environment means that you never interact with the package manager - calling using XXX will install the package and add it to the manifest, removing it will delete it from the manifest. The manifest (with version info) is included in the notebook file, either through comments or through Pkg.something commands that are hidden to the user.

When you open a self-contained notebook, Pluto will create a clean env and 'install' everything needed. Remember that Julia has a global cache! If you add a package to a new environment, it only downloads and builds that package if it has never been used on your machine before. Otherwise, it just adds one line to Manifest.toml. That's why I think that creating a clean environment for each notebook is a cool idea :)

Oh and you said that it could be dangerous, what do you mean by that? Isn't the dangerous part the running-arbitrary-julia-code-from-a-notebook, not the package install?

zenon commented 4 years ago

Isn't the dangerous part the running-arbitrary-julia-code-from-a-notebook, not the package install?

Hahaha, right.

Hm. Indeed right. Why didn't I think of that? I want a possibility to load a notebook without immediately executing it. I.e. I want to read it first. This should even be the default.

What I was pondering, rather technically than cyber security wise, is notebooks installing things that don't really work. Like for me in the last days, where Plots/GR only worked after some massage. (At another computer, install of Pluto failed, because it some of the downloads seemingly needed admin rights.) Rarely starting a Julia project works without hassels. Maybe that because I often use Windows, or computers without admin rights. But thats my situation.

So your plan sounds good for a ideal situation. I just rarely see it :-)

fonsp commented 4 years ago

Hm 😕 do you also have those issues when starting from an empty package environment? And is it just the crazy python packages that break?

I would love to know more about the Pluto-admin-install problem! Just the system info would also be helpful! It only has 2 dependencies and doesn't do anything wild to install, so perhaps it's the required HTTP.jl package (or installing any package) that didn't work.

(It would be great if you could report these issues to the Plots package!)

grero commented 4 years ago

Is the solution suggested above still expected to work? When I run this as my first cell

begin
    using Pkg
    Pkg.activate(".")
end

I kind of expected the files Project.toml and Manifest.toml to be created in the director containing the current notebook. However, no such files are created.

zenon commented 4 years ago

Hi Roger,

  1. what doesn "." refer to? I mean, are you sure to look into the right directory? Try pwd()
  2. I think I saw the same behavior some days ago. I just assumed that I did something wrong, and tried something else. (especially I didn't follow advise 1 :-) )

Kind greetings, z.

grero commented 4 years ago

Hi, My exception was that thatPkg.activate(".") would activate a new environment in the current working directory, which I thought to be the location of the notebook. I realise after running pwd() from the notebook, though, that I am actually in the folder from which Pluto was launched. In that case, I would expect the notebook itself to show up in that directory, rather than in a temporary directory. This is probably just my familiarity with Jupyter talking, though.

fonsp commented 4 years ago

Yep, what to do with the working directory is another issue... (In particular, after changing the notebook path - some cells can implicitly depend on the wd, but there is no way to detect that using syntax analysis). I guess that an okay solution is to always cd to the notebook's path?

fonsp commented 4 years ago

By the way, here is a more complicated example of setting up an environment

begin
    cd("/mnt/c/dev/julia/margo_tests/") # see edit below
    import Pkg
    Pkg.activate(".")
    using ClimateMARGO
    using Plots
    using LaTeXStrings
    using Colors
end

This works - but of course this should not be necessary - it's on the todo list!

When figuring out the "run all" order, cells that contain a using statement always run before cells that don't. This is why, for example, Pkg.activate(...) needs to be inside the same block as the using statements - otherwise it would run afterwards.

Edit: Pluto now does cd("path to notebook file") automatically. So if Project.toml is in the same directory as your notebook, you don't need the manual cd line.

ToucheSir commented 4 years ago

In the interest of reproducibility, it would be ideal if we could still provide a Project.toml (with compat if needed) and Manifest.toml. This is especially helpful in "mixed" environments where one has application/library code in addition to Pluto notebooks (ML comes to mind). Otherwise, keeping both sets of dependencies in sync is rather painful and somewhat of a dealbreaker.

fonsp commented 4 years ago

Thanks for pointing it out - I'm thinking that Pluto should detect whether you are in a package environment (by going up the file tree and looking for Project.toml) and use that one instead. This would be the you-know-what-you-are-doing mode

fonsp commented 4 years ago

That would also mirror IJulia's behavior: https://github.com/JuliaLang/IJulia.jl/pull/820

Roger-luo commented 4 years ago

I find currently Pluto will use whatever the environment Julia uses when running Pluto.run() in REPL, is that true?

fonsp commented 4 years ago

Not sure, maybe it depends on the Julia version. Currently, the best way to set up in environments is to copy the setup from the PlutoUI sample, but I want to make this a lot more smooth soon!

ToucheSir commented 4 years ago

@Roger-luo how did you get Pluto to pick up on a local project environment? I just tested with 0.11.0 and it still defaults to the global env?

Roger-luo commented 4 years ago

@ToucheSir I'm not sure I just start it using JULIA_PROJECT

ToucheSir commented 4 years ago

Perfect, that did the trick! Not sure why, but julia --project doesn't work the same way.

Roger-luo commented 4 years ago

I think this is because when you pass --project the sub process Pluto spawns does not inherit that flag, but it inherits the environment variables.

lungben commented 4 years ago

I really like the idea to include the environment into the notebook file!

Some suggestions:

lungben commented 4 years ago

I played around a bit with Pkg and have a suggestion (or rather a rough draft). When the following function instantiate_env() is executed automatically when a notebook is opened, a notebook specific Pkg environment (named notebook.jl.env) is activated.

get_env_name() = string(split(basename(@__FILE__), '#')[1], ".env")
function instantiate_env()
    @eval import Pkg
    env_name = get_env_name()
    Pkg.activate(env_name)
    Pkg.instantiate() # maybe do this only if .env dir exists?
end

A further enhancement could be the option to integrate the .env directory (the content of the toml files) directly into the notebook.jl file and to extract them from the notebook.jl file again.

I think this behavior should be opt-in, e.g. in a keyword argument to Pluto.run().

What do you think?

fonsp commented 4 years ago

Right now, I always include the following inside the notebook:

begin
    import Pkg
    Pkg.activate(mktempdir())
end

in one cell (mktempdir docs), and whenever I need Example, I write

begin
    Pkg.add("Example")
    import Example
end

or if I want to specify the version

begin
    Pkg.add(Pkg.PackageSpec(name="Example", version=v"1.2.3"))
    import Example
end

This way, running the notebook always starts in a clean package environment (no state!). So it's just like what you proposed, except we purposely don't save the environment in a separate file, but completely describe the environment inside the notebook!

This is pretty much what I want Pluto to do automatically in the future - the first cell will be built in, and you get the third cell when typing:

import Example @ 1.2.3

with nice autocomplete to help you. (Just import Example is no longer allowed)

fonsp commented 4 years ago

I was working on a design doc to receive feedback on about the import Example @ 1.2.3, but it slipped a little. More in a couple of weeks!

lungben commented 4 years ago

Sounds great!

Will there be support for unregistered or local packages (currently added via ] add https://url:port/my_repo.git)? Furthermore, for a completely reproducible environment, it would be great to have the possibility to also freeze the dependencies of your dependencies (like currently with Manifest.toml files).

fonsp commented 4 years ago

Yep! Instead of a version number you can also give a version range, github url (+branch) or local path

I did not think about the option to freeze... Thanks for pointing it out!

Roger-luo commented 4 years ago

I'm wondering if there could be an option in Pluto.SessionActions.open on the environment path, so I can set it at least in CLI interface first. I currently need to change the environment by spawning a subprocess with JULIA_PROJECT environment variable.

But I guess Pluto is using Distributed.addprocs somewhere if I understand correctly? Then it should be possible to pass the --project flag to that and filter out this environment variable given that this can change the behaviour of all individual notebooks.

e.g we could have per session environment config? Then we could manage environments say for different users on server side easily by just spawning different sessions.

cadojo commented 4 years ago

As this is fully implemented, are there plans to support local environments too? I love the idea of a stateless notebook, it's made development and debugging much simpler.

Still, I work on dynamics & numerically heavy programs - if I understand this correctly, every time I open a notebook I wrote I would need to install & precompile all of the packages I'm using.

Some of them are pretty big (thinking about Images, ControlSystems, Plots, ModelingToolkit), and might cause 5-10 minutes of overhead every time the notebook is opened. Is there another way around that?

Roger-luo commented 4 years ago

@cadojo I implemented local environments support in #341. It currently works for me but I think need @fonsp 's review at least, and I'm not sure about how to test it either.

I'd like to discuss what we need to have in the frontend, one thing I'm thinking about is the inline Project.toml, that means in the notebook it is allowed to contain an inline Project.toml in a special cell that runs the pkg mode commands (normal cells can be changed to that by press ] maybe).

[deps]
A = 
B = 

[compat]
A = "0.1"
B = "0.2"

and it will be stored inside the notebook.jl the script too as a meta-information in comments like other things generated by Pluto, which is at the beginning, and maybe we can just remove the current version number field since it can be replaced by this proposal directly

### A Pluto.jl notebook ###
# [deps]
# A = uuid
# B = uuid
# 
# [compat]
# A = "0.1"
# B = "0.2"

I think what I'm thinking is to make pluto notebooks self-contained, rather than "have to be shared by a Julia project folder". Since a lot of times, people just want to share one notebook file, and run it everywhere. It makes no sense to generate an entire project folder to run one single notebook script.

Roger-luo commented 4 years ago

also, I'd like to mention that the above workaround using add in front of each notebook can result in very long start-up time ( as mentioned by @cadojo) and can potentially cause the script to fail due to incompatible versions thus I think in Julia using a Project.toml directly. Thus I think ship each notebook with a minimal Project.toml that contains [deps] and [compat] is the most ideal solution to this.

fonsp commented 4 years ago

@cadojo

every time I open a notebook I wrote I would need to install & precompile all of the packages I'm using.

Julia uses a global package + precompilation cache, which also works for new environments that use a previously installed version. The code I wrote here https://github.com/fonsp/Pluto.jl/issues/142#issuecomment-685135043 does not do any real work when you run it a second time - it's just some tricks to make the package environment stateless. A package environment only contains version information, it does not contain the packages themselves.

So the overhead you mentioned is typical for Python venvs and Node projects, for example, but it should not be a problem with Julia. Can you try it out?

cadojo commented 4 years ago

Thanks for the explanation @fonsp. That makes sense - I previously noticed that my project environment sometimes wouldn't precompile packages, but I wasn't sure why (I didn't know Julia used global package cacheing).

I retract my previous comment!

mbauman commented 4 years ago

I'd very much be in support of a scheme like https://github.com/JuliaLang/IJulia.jl/pull/820 — just use the environment from where the notebook lives, if there is one.

fonsp commented 4 years ago

There will be the option to do that, but it will not be the default - Pluto notebooks should be reproducible as a single file by default. Ideally, correctly understanding, managing and sharing a package environment should not be a prerequisite for your notebooks being reproducible.

The default process should be:

  1. Create a new notebook
  2. If you need dependencies, use the most obvious/easy way to get them
  3. Send the file to someone else
  4. They open the file in Pluto (possibly 5 years later)

And this should "just work"!

Roger-luo commented 4 years ago

@mbauman I'm planning to do it in the PlutoCLI but not exactly this behaviour, since using a local environment is also my current main workflow too.

And I agree with @fonsp this is actually also an issue with scripting in Julia too. Currently Julia scripts always need to be shipped with a Project.toml/Manifest.toml to be reproducible. This can be too heavy for scripting. I remember there was an inline Project.toml proposal somewhere mentioned by @fredrikekre but I cannot find it anymore.

I think there is also a workaround on this using a package. But I hope for Pluto notebooks this can be something default for individual users, it will improve scripting experience in Julia a lot:

  1. every notebook can be shipped with its own inline Project.toml config.
  2. if there is no per notebook environment defined, try @. or a given environment path specified by Pluto.run(;project=)
  3. if there is no @. or given environment path, fallback to the global default Julia environment

And there are actually some issues with an implicit environment setting: you just never know which environment is used exactly from the code. Thus one can not be certain on the reproducibility of a given notebook. I guess for jupyter-notebook since when it was created Julia environment didn't exist and there is not much space to design it with Julia environments in mind. But not for Pluto notebooks, to me JULIA_PROJECT=@. is more like a workaround, since there are no default way to config environment explicitly.

mbauman commented 4 years ago

Pluto notebooks should be reproducible as a single file

I suppose I come at Pluto from the other side: every notebook I've created so far has been a complement to a bigger work. A bigger work that has a project and manifest and data and other dependencies. A bigger work that is itself geared for reproducibility with a git repo and such. I just don't see Pluto as the tool for managing reproducibility.

That said, I definitely agree the implicitness in IJulia's @. solution isn't ideal.

fonsp commented 4 years ago

I mean that you are an experienced user, this is not the target audience for Pluto's default behavior. The default behavior will assume that the user does not understand any of this, or at least that they don't want to manage a package environment using external tools themselves.

Pluto notebooks should "just work", especially for new users - this has always been a core motivation for this project. And don't worry! I completely appreciate your use case, and it will be supported.

This article is much better at articulating how I feel about accidental complexity (like managing a package environment): Matt Huebert - When I Sit Down At My Editor, I Feel Relaxed

garrison commented 3 years ago

For a while (I believe as recently as v0.11.12), Pluto would load the project given by the JULIA_PROJECT environment variable. When I upgraded to v0.11.14, this stopped working -- I am not sure why. So I changed to using Pkg.activate() in the notebook, as recommended above. Unfortunately, setting the project this way does not apply to worker processes. I did not even realize this was an issue (or that BenchmarkTools uses worker processes), until my use of the macros in BenchmarkTools kept spitting errors to my console about packages not being found. It would be wonderful to have a real solution to this problem.

EDIT: With #341, it looks like I now need to specify PLUTO_PROJECT. Not entirely straightforward (or obvious why I must set another variable), but hey, at least I can get the old behavior :).

fonsp commented 3 years ago

sneak peek - recent experimental experiments: (experimental)

Pkg will get a GUI

This can already be done using a macro with clever JS output asdf

(experiment!)

but...

...it will be embedded inline

pkg ui 1

(experiment monkeys will be replaced with Pkg GUI)

The options shown here are just fixed versions (or monkeys), but you will be able to use anything that you can write after ] add and ] dev. And yes an option to opt out.

experiments!

Roger-luo commented 3 years ago

@garrison just an update, in v0.12, you can specify which project to use via project kwargs in Pluto.run, e.g

Pluto.run(;project="@.")

will look for closest julia Project.toml instead of global ones, or you can also feed in a path to specify which Project.toml or Manifest.toml you want to use.

There is also per notebook environment variable settings built internally, but it's not exposed publicly.


@fonsp I'm wondering how do you serialize this in to the notebook file currently, should we just have an inline TOML in the comments? I can experiment this feature for normal Julia scripts in IonCLI as ion run script.jl

one more thing I'm thinking is that I guess it's actually possible to not create a Project.toml in tempdir but directly create a in memory Pkg.Context from the inline TOML of the script, which can make things a bit faster I think.

j-fu commented 3 years ago

For inline TOML see also #421 ...

c42f commented 3 years ago

There will be the option to do that, but it will not be the default - Pluto notebooks should be reproducible as a single file by default. Ideally, correctly understanding, managing and sharing a package environment should not be a prerequisite for your notebooks being reproducible.

:100: I feel very strongly that this is the correct way forward. In fact, I investigated doing this very thing for normal Julia scripts in https://github.com/c42f/CodeEnvironments.jl (though at the time I implemented that, I seemed unable to convince anyone that it was very worthwhile!). (BTW, the encoding of the Manifest in CodeEnvironments is not inherent, but partly a workaround for UI considerations in jupyter, and in text editors.)

I know I'm just echoing what @fonsp said already! But I feel that managing environments (let alone understanding git) adds intolerable accidental complexity for casual users. Also that having the manifest and project separate from the notebook file will immediately lead to these files getting separated as they're copied around.

fonsp commented 3 years ago

(to contributors: https://github.com/fonsp/pkg-experiments & https://github.com/fonsp/Pluto.jl/tree/experiment-pkg-ui-1 )

oschulz commented 3 years ago

Also that having the manifest and project separate from the notebook file will immediately lead to these files getting separated as they're copied around

I'm just curious - why does Pluto "disrespect" the currently active Julia project and start the notebook in the default environment? I'm aware that there may be a well thought-out reason behind this, but it did surprise me quite a bit (not being in the env I thought I was).

I would actually love Pluto to pick up an environment (Project.toml and Manifest.toml) found in the same directory as the notebook automatically (like IJulia does with defaults). [Edit: Pluto does this now, see below.]

oschulz commented 3 years ago

One thing regarding Pluto.run(;project=...): When using Pluto.run(;project="@myenv"), Pluto will currently activate an environment literally named "@myenv" in the current directory, instead of activating "myenv" in the default environment directory (like the package console does). Would it be possible to change that? Using absolute paths can be so tedious. :-)

Roger-luo commented 3 years ago

@myenv does mean environment named @myenv in all Julia programs. It's not a Pluto only thing. Do you actually want @.?

oschulz commented 3 years ago

Well, on the Julia package management console, "@myenv" means "myenv" in the default environments directory.

Roger-luo commented 3 years ago

That's not the package API unless Pkg.jl changes its API I don't think Pluto should use a different convention

oschulz commented 3 years ago

See here, though: https://github.com/JuliaLang/julia/issues/35354