JuliaLang / Pkg.jl

Pkg - Package manager for the Julia programming language
https://pkgdocs.julialang.org
Other
622 stars 262 forks source link

Proposal for more first class handing of sysimages in Pkg #2008

Open KristofferC opened 4 years ago

KristofferC commented 4 years ago

Issue

Using a custom sysimage can drastically reduce load times of packages. The goto solution for this is PackageCompiler.jl and it works well but it isn't used as much as perhaps warranted considering the benefits it provides. From some discussion, it seems that it is a bit too much of a "mental overhead" to use it. To use PackageCompiler.jl for a sysimage it requires you to:

Since load time of packages and "time to first plot" are a frequent gripe about Julia, it makes sense to see if we can give a better interface to PackageCompiler.

Proposal

The proposal here is to introduce a new set of Pkg API that handles sysimages. To give a taste of what a session would look like:

pkg> sysimage create
Info: Creating a new sysimage based on the packages in curent project at `~/.julia/environments/v1.5/Project.toml` 
Packages tracked by path and their dependencies not put into sysimage:
    - OhMyREPL
    └ DataStructures, Crayons
Info: Package `OhMyREPL` not put into sysimage because it is tracked by path. This caused its dependencies `OrderedCollect

pkg> sysimage status
(@v1.5) pkg> status
Status `~/.julia/environments/v1.5/sysimage.dylib`
  [6e4b80f9] BenchmarkTools v0.5.0
  [f68482b8] Cthulhu v1.2.0
  [0c46a032] DifferentialEquations v6.15.0
  [7876af07] Example v0.5.3
  [8fb92a4a] Exfiltrator v0.1.0
  [b22a6f82] FFMPEG_jll v4.3.1+2
Package in project not in sysimage
  [6e4b80f9] OhMyREPL v0.7.0 `~/JuliaPkgs/OhMyREPL.jl`
  [ae3bc0f9] DataStructures v0.5.0
 [a8cc5b0e] + Crayons v4.0.4

pkg> up
Updating `~/.julia/environments/v1.5/Project.toml`
  [a8cc5b0e] + Crayons v4.0.4
Updating `~/.julia/environments/v1.5/Manifest.toml`
  [a8cc5b0e] ↑ Crayons v4.0.3 ⇒ v4.0.4

pkg> sysimage status
Warn: Some packages in the sysimage are out of date with project, run `sysimage update` to update it:
  [a8cc5b0e] ↑ Crayons v4.0.3 ⇒ v4.0.4
...

pkg> sysimage update
...

Next time we start julia:

> julia --project

❯ /usr/local/bin/julia -q

Info: Automatically using sysimage at `~/.julia/environments/v1.5/sysimage.dylib`
julia> @time using DifferentialEquations # look how fast
0.0202 seconds (144.73 k allocations: 7.456 MiB)

So the concrete proposal here is to add convenience functionalities to Pkg to make dealing with sysimage easier.

In addition, this proposes adding some functionality to Julia itself that allows it to automatically detect a custom sysimage next to the project and use that for the Julia process. This could be done via some naming convention.

Why in Pkg and not in a separate package.

The main point of this proposal is to reduce the friction in using e.g. PackageCompiler. Bundling it with Pkg allows it to use the super user-friendly Pkg REPL with no need to manually install anything. Also, we likely want to use a lot of the code in Pkg for dealing with projects, for status printing, etc so from that point of view, it makes sense to have it in Pkg. One question is if the code for PackageCompiler itself should move into Pkg. I think it is best to not do this but instead, just install PackageCompiler into the global project from Pkg when the sysimage command is used for the first time.

Possible complications:

cc @tkf since I think you have thought a bit about stuff like this

fredrikekre commented 4 years ago

In addition, this proposes ading some functionality to Julia itself that allows it to automatically detect a custom sysimage next to the project and use that for the Julia process. This could be done via some naming convention.

xref https://github.com/JuliaLang/julia/pull/35794

tkf commented 4 years ago

Yeah, I wrote JuliaLang/julia#35794 so that user-friendly interface like this would be easy to implement.

FWIW, the proposal LGTM. A few minor comments:

just install PackageCompiler into the global project from Pkg

Maybe do what --bug-report does? IIUC it checks for the current environment and then install BugReporting.jl in a temporary environment if it's not installed. This approach was very handy for fixing BugReporting.jl bugs. (Ref @StefanKarpinski's comment https://github.com/JuliaLang/julia/pull/35494#issuecomment-614886055)

  • Right now, upgrading the Julia minor version is super easy. With a custom sysimage one needs to refresh all the sysimages.

My approach in JuliaLang/julia#35794 was to compute system image storage path from (the hash of) the path to julia binary. This way, unmatched system image is not used and julia fallbacks to the default system image.

I think it's likely that minor version would be installed in a different path so this may be enough. To be more careful, I think we can include, e.g., Julia version in the hash.

KristofferC commented 4 years ago

Maybe do what --bug-report does? IIUC it checks for the current environment and then install BugReporting.jl in a temporary environment if it's not installed.

Yes, that is better.

I think it's likely that minor version would be installed in a different path so this may be enough.

Not sure it will be enough on mac where I think it is /Applications/Julia-1.5.app/Contents/Resources/julia/bin/julia for all 1.5 versions.

tkf commented 4 years ago

/Applications/Julia-1.5.app/Contents/Resources/julia/bin/julia for all 1.5 versions.

Ah, that's unfortunate. I guess I'd have to put the version in the hash if we are going to use JuliaLang/julia#35794.

ericphanson commented 3 years ago

This is only semi related to the feature proposed here, but I think it would be helpful if the resolver could take into account the sysimage when choosing versions of dependencies. E.g. if I start with a "base" sysimage with say things like Plots, and then I make an environment and start adding packages to it, it would be great if my shared dependencies / transitive dependencies chose versions that did not conflict with those baked into the sysimage already. Already the manifest you get from a Project.toml depends on the version of Julia you use, but maybe it should actually depend on the sysimage you use (of which Julia version is kind of a special case).

My actual use-case isn't plots, but is very similar, something like: some process has produced a docker image with Julia and a sysimage and some code and/or other artifacts. Now I want to start from that image and add some more packages. I don't want to regenerate the sysimage (since that takes awhile and the code I'm adding is comparatively light) but I do want versions resolved correctly so I don't run into weird bugs.

If one has a custom sysimage for the default environment, starts julia, changes the environment, and then start loading packages, packages from the sysimage for the default environment will still be "locked-in". Pkg could warn about this when a new project is activated.

I think the above could help with this; at least, if you don't have a manifest, it could try to resolve a compatible manifest instead of just giving a warning. And if you do have a Manifest, it could warn or maybe even prompt to regenerate the manifest.

KristofferC commented 3 years ago

This is only semi related to the feature proposed here, but I think it would be helpful if the resolver could take into account the sysimage when choosing versions of dependencies.

If we store the version of Plots into the sysimage, this could be done. And yes, I think that should be done as a part of this proposal.

ericphanson commented 3 years ago

It seems to me there might be some overlap with https://github.com/JuliaLang/Pkg.jl/issues/1233 in terms of merging projects. I.e. if you have a sysimage loaded, then any project you activate is kind of a "subproject" of the project baked into that sysimage (with regards to version resolution, I'm not talking about folder structure or anything like that of course).

KristofferC commented 3 years ago

Yes, I think that is a pretty good way to look at it.

ericphanson commented 3 years ago

It would be great if this proposal had some way of excluding packages from the sysimage, e.g. just compile the registered dependenices of the package whose environment you are in, so you don't need to recompile the sysimage if you update a dev'd / add'd dependency or the package code itself.