JuliaLang / Pkg.jl

Pkg - Package manager for the Julia programming language
https://pkgdocs.julialang.org
Other
625 stars 267 forks source link

`Pkg.add` should not update registry #3369

Open jakobnissen opened 1 year ago

jakobnissen commented 1 year ago

Running Pkg.add should not automatically update the registry. Optionally add a warning when running Pkg.add if the registry is more than, say, 3 months old and Pkg is not in offline mode.

Background

A recent Discourse thread on the increased precompilation time with Julia 1.9 led to some discussion on Slack about what can be done to reduce the amount of precompilation.

I don't think the Slack discussion reached a consensus, but several good suggestions were brought up that deserve an issue here on Pkg. Notably, some of the proposed solutions are good ideas regardless of whether Julia 1.9's precompilation woes will be solved elsewhere.

Motivation

The motivation is twofold: First, from a theoretical standpoint, adding a package and updating are conceptually different and orthogonal operations. You should be able to add a package without updating anything. Hence, the two operation should be separated.

From a practical point of view, developers and experienced users of Julia often create environments and modify their environments many times per day, sometimes tens of times per day. With the current state where every Pkg.add operation updates automatically, these users are going to be served new releases of various packages (dependencies of dependencies) very often, each time triggering waiting time for fetching the registry and/or package precompilation. For most users, getting the very newest version of every package is completely unnecessary - certainly getting up-to-the-hour updates is not needed.

Possible objections

Q: Wouldn't this mean lots of people would get stuck on months old packages? A: The registry would only get stale if they never choose to run Pkg.update for any project. And if they never do so, it's probably safe to assume they are fine running an old environment. Furthermore, we should in general expect many users explicitly want to run years old environments that they choose not to upgrade or modify.

IanButterworth commented 1 year ago

The summary that

the current state where every Pkg.add operation updates automatically, these users are going to be served new releases of various packages (dependencies of dependencies) very often

Isn't quite fair. Even if the registry is regularly updated, by default Pkg.add uses a tiered preserve strategy, where it tries to add the requested package while updating no existing package in the environment first, but if it cannot will go onto try the more permissive strategies (see https://pkgdocs.julialang.org/v1/api/#Pkg.add)

If Pkg.add has updated any existing dep, it should mean it couldn't add the package without doing that.

The exception is the build number bug detailed here https://github.com/JuliaLang/Pkg.jl/issues/1568#issuecomment-1425948204

jariji commented 1 year ago

I wonder if there could be a smarter algorithm that figures out the optimal frequency for updating the registry. It could account for how often new versions are released for the currently installed packages and when the last time the registry was updated. There could be a user-tunable parameter for how sensitive the user is to having the newest version.

fredrikekre commented 1 year ago

If there is nothing new, then "updating" the registry is a no-op essentially;

julia> @time Pkg.Registry.update()
    Updating registry at `~/.julia/registries/General.toml`
  0.054130 seconds (1.91 k allocations: 122.359 KiB)

and as Ian mentioned, other packages shouldn't be updated by Pkg.add anyway, so I doubt automatic registry updating contributes much to the problems mentioned in the OP.

jakobnissen commented 1 year ago

Suppose you with to add Pkg A, which is cached in your depot, and has a lot of dependencies. A new version of one of these dependencies are released. When you add it, does the new transitive dependency need to be downloaded? That would be a source of lots of unnecessary downloaded packages.

Edit: when testing this situation on my own computer, indeed it does download new packages even when I only ask to add a package I already have - in my case, needlessly re-triggering precompilation of Plots which took 91 seconds.

KristofferC commented 1 year ago

Can you give a concrete example? It isn't clear to me what exact scenario you are describing.

jakobnissen commented 1 year ago

Sure. I'm at work and spin up a new environment for the day's analysis. I get some data that I decide I want to make a quick barplot of, so I pkg> add Plots to the environment - a package I use on a daily/weekly basis and have cached in my depot.

However, since Plots has 175 transitive dependencies, if a new version of any of these packages have been pushed since I last made a plot, I will now have to download the registry, download this new package, and recompile Plots. Completely unnecessarily - I just want to make a barplot.

To me it feels unnecessary. I didn't ask for updates. If I did, I would have typed pkg> up. I just need to install a package I already have. Why not separate the two distinct operations of getting a new package and updating your packages?

KristofferC commented 1 year ago

You probably want https://github.com/JuliaLang/Pkg.jl/issues/1233 for cases like this which we should try get into 1.10. In short, it would allow you to share a big manifest among many projects (and thus use the same version of the packages).

timholy commented 1 year ago

I also agree that #1233 is desirable as Julia 1.10 material. Still, I think we can do better for 1.9. Here's the sequence I imagine @jakobnissen is describing:

There are overlapping ways to solve this. In this particular case, Pkg.offline(true) would presumably have done exactly what I wanted. However, there are two differences of note:

To me, the bottom line is that stuff is happening that I didn't ask for, and there's no way to shut it off. When I start with an analysis environment and then say SomeProject>add Plots, what I'm really saying is "I will need plotting in this environment," not "please give me the latest version of every package I hadn't yet requested."

I agree it would be problematic to change the default behavior here, but providing at least the option to pkg> add -noregistry Plots seems sensible and likely to be effective.

KristofferC commented 1 year ago

I agree it would be problematic to change the default behavior here, but providing at least the option to pkg> add -noregistry Plots seems sensible and likely to be effective.

What is -noregistry supposed to do? The equivalent of a temporary Pkg.offline for the duration of one operation?

timholy commented 1 year ago

Still allowed to download new package code. It just doesn't update the registry first (it uses the current version of the registry installed on my machine).

Hmm, would this be equivalent to Pkg.UPDATED_REGISTRY_THIS_SESSION[] = true, and that my perception that this still doesn't quite work properly is just because of the build-number bug? If so, then maybe the better solution is simply to document this option and then close this issue.

aplavin commented 1 year ago

Registry update isn't the whole story here. Even if I have an up-to-date registry and ]add Plots in an environment, it makes sense to install Plots version from a week ago, if it's already stored and precompiled on my machine. It's relatively rare when the actual newest version is needed, and in those cases one can do ]up. Still, this update shouldn't make all further heavy package installations to precompile effectively from scratch.

KristofferC commented 1 year ago

Still allowed to download new package code. It just doesn't update the registry first (it uses the current version of the registry installed on my machine).

Artificially holding the registry back seems like a bad solution to this because as soon as you somehow accidentally forget to hold it back and it updates you are pretty much out of luck because you cannot really go back. So this situation is kind of living on an unstable equilibrium. The registry is just information about what package version exists and it should never be bad to just get more information. It is up to us to form our queries so that we get the package versions we want.

So, what we actually want seems to be: Install packages such that we don't have to 1.) download them 2.) precompile them.

Something similar to a command run under Pkg.offline seems like it would get you quite far there. If it is already downloaded, it is likely also precompiled. You could also allow it to download packages if those packages are not available at all in order to avoid a complete failure in those cases.

IanButterworth commented 1 year ago

I like this proposal, and perhaps it could be spelled

pkg> add --installed Plots

i.e. "add the installed version of package Plots"

visr commented 1 year ago

I like this too, it's what I also proposed on Discourse.

Being greedy here, but why not make it the default? The tiered algorithm now only looks at preserving versions of packages already in the environment, but this doesn't help us when creating new environments. Could preserving both versions and installations be the new upper tier?

What would be the downside to this? Some users may get some older packages sometimes, but once doing a Pkg.update somewhere will install the latest, making it available for all Pkg.add. In the most extreme case, if a user never runs Pkg.update and is never forced to install newer versions due to compat bounds, they would add versions that are as old as their julia install, when they installed the package in the depot for the first time.

IanButterworth commented 1 year ago

Just another idea if it were to be opt-in, to put the flag in the place the version specifier is usually and make it package specific

pkg> add Plots@installed Flux

So "add the installed version of Plots, and Flux latest"

timholy commented 1 year ago

The registry is just information about what package version exists and it should never be bad to just get more information. It is up to us to form our queries so that we get the package versions we want.

Agreed that in principle it isn't bad to get more information, if you have good mechanisms that don't force you to exploit that new information. My point is that #1233 is a 1.10 solution; I'm trying to find something to dull the pain for 1.9 that also doesn't cause regrets later. Avoiding registry updates seemed to me to be a "cheap" way of accomplishing that. If pkg> add --installed Plots is doable for 1.9, that sounds much better.

But if it isn't, what do you propose instead?

Even if I have an up-to-date registry and ]add Plots in an environment, it makes sense to install Plots version from a week ago, if it's already stored and precompiled on my machine.

Sure. I'm just trying to find something that often works that we can accomplish for 1.9. The boat has sailed on big, new features; we have to do something that isn't so ambitious that rushing it could cause regrets later. (Once we ship something in a release, we're stuck with it.) Rare registry updates will "asymptotically" get the behavior you want, just not as fast as an optimal solution.

But again, I'm very happy to accept any reasonable solution to the problem. I just don't want to see it go unaddressed in 1.9. If there's a "perfect" solution being designed for 1.10, I'm arguing that we also need a no-regrets bandaid for 1.9.

KristofferC commented 1 year ago

Being greedy here, but why not make it the default?

Changing defaults without any trial period is kind of iffy. And it also means there is this hidden state that goes into the default resolver procedure which can be tricky to debug and maybe confusing. Say two colleagues (or a teacher and a student) doing add Foo and get a bunch of different versions for all the dependencies. It just doesn't feel that great of a solution.

visr commented 1 year ago

My feeling is that there is enough tooling in place to mitigate those concerns. Running up will always ignore installs, there are compat bounds, status showing outdated packages, and manifest sharing that can be used when appropriate. We could even add something that would act similar to the current add Example, like add Example@latest (perhaps not the best name).

gaurav-arya commented 1 year ago

One weird idea I had is to allow writing e.g. add @plots, which would add all the packages in the named environment @plots (#3347). No env stacking or backrefs, just adding of packages, but at the same versions as are currently in @plots, and so as long as @plots has been precompiled this add would be fast. Then, the workflow would be to keep e.g. plotting utilities in an environment @plots and add them like this.

Probably not the right solution here, and possibly superceded by #1233, but just wanted to float it as food for thought.