Open jakobnissen opened 1 year ago
The summary that
the current state where every Pkg.add operation updates automatically, these users are going to be served new releases of various packages (dependencies of dependencies) very often
Isn't quite fair. Even if the registry is regularly updated, by default Pkg.add
uses a tiered preserve strategy, where it tries to add the requested package while updating no existing package in the environment first, but if it cannot will go onto try the more permissive strategies (see https://pkgdocs.julialang.org/v1/api/#Pkg.add)
If Pkg.add
has updated any existing dep, it should mean it couldn't add the package without doing that.
The exception is the build number bug detailed here https://github.com/JuliaLang/Pkg.jl/issues/1568#issuecomment-1425948204
I wonder if there could be a smarter algorithm that figures out the optimal frequency for updating the registry. It could account for how often new versions are released for the currently installed packages and when the last time the registry was updated. There could be a user-tunable parameter for how sensitive the user is to having the newest version.
If there is nothing new, then "updating" the registry is a no-op essentially;
julia> @time Pkg.Registry.update()
Updating registry at `~/.julia/registries/General.toml`
0.054130 seconds (1.91 k allocations: 122.359 KiB)
and as Ian mentioned, other packages shouldn't be updated by Pkg.add
anyway, so I doubt automatic registry updating contributes much to the problems mentioned in the OP.
Suppose you with to add Pkg A, which is cached in your depot, and has a lot of dependencies. A new version of one of these dependencies are released. When you add it, does the new transitive dependency need to be downloaded? That would be a source of lots of unnecessary downloaded packages.
Edit: when testing this situation on my own computer, indeed it does download new packages even when I only ask to add a package I already have - in my case, needlessly re-triggering precompilation of Plots which took 91 seconds.
Can you give a concrete example? It isn't clear to me what exact scenario you are describing.
Sure. I'm at work and spin up a new environment for the day's analysis. I get some data that I decide I want to make a quick barplot of, so I pkg> add Plots
to the environment - a package I use on a daily/weekly basis and have cached in my depot.
However, since Plots has 175 transitive dependencies, if a new version of any of these packages have been pushed since I last made a plot, I will now have to download the registry, download this new package, and recompile Plots. Completely unnecessarily - I just want to make a barplot.
To me it feels unnecessary. I didn't ask for updates. If I did, I would have typed pkg> up
. I just need to install a package I already have. Why not separate the two distinct operations of getting a new package and updating your packages?
You probably want https://github.com/JuliaLang/Pkg.jl/issues/1233 for cases like this which we should try get into 1.10. In short, it would allow you to share a big manifest among many projects (and thus use the same version of the packages).
I also agree that #1233 is desirable as Julia 1.10 material. Still, I think we can do better for 1.9. Here's the sequence I imagine @jakobnissen is describing:
Plots
, that depends on PkgA
and PkgB
. I use Plots
a lot, and there's a compiled version on my hard drive somewhere.DevPkg
, which happens to depends on PkgA
. Now I decide to DevPkg> add Plots
. Pkg, being the lovely citizen it is, agrees that since we're already depending on PkgA
, let's use the same version that's in the manifest.DevPkg
does not (yet) depend on PkgB
. So my add
triggers a registry update, which discovers that there's a new release of PkgB
. Thus it downloads the new PkgB
and re-precompiles Plots.There are overlapping ways to solve this. In this particular case, Pkg.offline(true)
would presumably have done exactly what I wanted. However, there are two differences of note:
Plots
, but I do have precompiled versions of HeavyDependency
upon which Plots
depends. To get Plots
, I'm going to have to hit the network; but while updating the registry, I may also discover there's a new version of HeavyDependency
or one of its dependencies. So rather than the relatively minimal task of installing and precompiling just Plots
, now I'm precompiling the whole stack. This case cannot be handled with offline
(I had to hit the network to get Plots
) but is handled if we can suppress registry updates.To me, the bottom line is that stuff is happening that I didn't ask for, and there's no way to shut it off. When I start with an analysis environment and then say SomeProject>add Plots
, what I'm really saying is "I will need plotting in this environment," not "please give me the latest version of every package I hadn't yet requested."
I agree it would be problematic to change the default behavior here, but providing at least the option to pkg> add -noregistry Plots
seems sensible and likely to be effective.
I agree it would be problematic to change the default behavior here, but providing at least the option to pkg> add -noregistry Plots seems sensible and likely to be effective.
What is -noregistry
supposed to do? The equivalent of a temporary Pkg.offline
for the duration of one operation?
Still allowed to download new package code. It just doesn't update the registry first (it uses the current version of the registry installed on my machine).
Hmm, would this be equivalent to Pkg.UPDATED_REGISTRY_THIS_SESSION[] = true
, and that my perception that this still doesn't quite work properly is just because of the build-number bug? If so, then maybe the better solution is simply to document this option and then close this issue.
Registry update isn't the whole story here. Even if I have an up-to-date registry and ]add Plots
in an environment, it makes sense to install Plots
version from a week ago, if it's already stored and precompiled on my machine. It's relatively rare when the actual newest version is needed, and in those cases one can do ]up
. Still, this update shouldn't make all further heavy package installations to precompile effectively from scratch.
Still allowed to download new package code. It just doesn't update the registry first (it uses the current version of the registry installed on my machine).
Artificially holding the registry back seems like a bad solution to this because as soon as you somehow accidentally forget to hold it back and it updates you are pretty much out of luck because you cannot really go back. So this situation is kind of living on an unstable equilibrium. The registry is just information about what package version exists and it should never be bad to just get more information. It is up to us to form our queries so that we get the package versions we want.
So, what we actually want seems to be: Install packages such that we don't have to 1.) download them 2.) precompile them.
Something similar to a command run under Pkg.offline
seems like it would get you quite far there. If it is already downloaded, it is likely also precompiled. You could also allow it to download packages if those packages are not available at all in order to avoid a complete failure in those cases.
I like this proposal, and perhaps it could be spelled
pkg> add --installed Plots
i.e. "add the installed version of package Plots"
I like this too, it's what I also proposed on Discourse.
Being greedy here, but why not make it the default? The tiered algorithm now only looks at preserving versions of packages already in the environment, but this doesn't help us when creating new environments. Could preserving both versions and installations be the new upper tier?
What would be the downside to this? Some users may get some older packages sometimes, but once doing a Pkg.update somewhere will install the latest, making it available for all Pkg.add. In the most extreme case, if a user never runs Pkg.update and is never forced to install newer versions due to compat bounds, they would add versions that are as old as their julia install, when they installed the package in the depot for the first time.
Just another idea if it were to be opt-in, to put the flag in the place the version specifier is usually and make it package specific
pkg> add Plots@installed Flux
So "add the installed version of Plots, and Flux latest"
The registry is just information about what package version exists and it should never be bad to just get more information. It is up to us to form our queries so that we get the package versions we want.
Agreed that in principle it isn't bad to get more information, if you have good mechanisms that don't force you to exploit that new information. My point is that #1233 is a 1.10 solution; I'm trying to find something to dull the pain for 1.9 that also doesn't cause regrets later. Avoiding registry updates seemed to me to be a "cheap" way of accomplishing that. If pkg> add --installed Plots
is doable for 1.9, that sounds much better.
But if it isn't, what do you propose instead?
Even if I have an up-to-date registry and ]add Plots in an environment, it makes sense to install Plots version from a week ago, if it's already stored and precompiled on my machine.
Sure. I'm just trying to find something that often works that we can accomplish for 1.9. The boat has sailed on big, new features; we have to do something that isn't so ambitious that rushing it could cause regrets later. (Once we ship something in a release, we're stuck with it.) Rare registry updates will "asymptotically" get the behavior you want, just not as fast as an optimal solution.
But again, I'm very happy to accept any reasonable solution to the problem. I just don't want to see it go unaddressed in 1.9. If there's a "perfect" solution being designed for 1.10, I'm arguing that we also need a no-regrets bandaid for 1.9.
Being greedy here, but why not make it the default?
Changing defaults without any trial period is kind of iffy. And it also means there is this hidden state that goes into the default resolver procedure which can be tricky to debug and maybe confusing. Say two colleagues (or a teacher and a student) doing add Foo
and get a bunch of different versions for all the dependencies. It just doesn't feel that great of a solution.
My feeling is that there is enough tooling in place to mitigate those concerns. Running up
will always ignore installs, there are compat bounds, status
showing outdated packages, and manifest sharing that can be used when appropriate. We could even add something that would act similar to the current add Example
, like add Example@latest
(perhaps not the best name).
One weird idea I had is to allow writing e.g. add @plots
, which would add all the packages in the named environment @plots
(#3347). No env stacking or backrefs, just adding of packages, but at the same versions as are currently in @plots
, and so as long as @plots
has been precompiled this add
would be fast. Then, the workflow would be to keep e.g. plotting utilities in an environment @plots
and add them like this.
Probably not the right solution here, and possibly superceded by #1233, but just wanted to float it as food for thought.
Running
Pkg.add
should not automatically update the registry. Optionally add a warning when runningPkg.add
if the registry is more than, say, 3 months old and Pkg is not in offline mode.Background
A recent Discourse thread on the increased precompilation time with Julia 1.9 led to some discussion on Slack about what can be done to reduce the amount of precompilation.
I don't think the Slack discussion reached a consensus, but several good suggestions were brought up that deserve an issue here on Pkg. Notably, some of the proposed solutions are good ideas regardless of whether Julia 1.9's precompilation woes will be solved elsewhere.
Motivation
The motivation is twofold: First, from a theoretical standpoint, adding a package and updating are conceptually different and orthogonal operations. You should be able to add a package without updating anything. Hence, the two operation should be separated.
From a practical point of view, developers and experienced users of Julia often create environments and modify their environments many times per day, sometimes tens of times per day. With the current state where every
Pkg.add
operation updates automatically, these users are going to be served new releases of various packages (dependencies of dependencies) very often, each time triggering waiting time for fetching the registry and/or package precompilation. For most users, getting the very newest version of every package is completely unnecessary - certainly getting up-to-the-hour updates is not needed.Possible objections
Q: Wouldn't this mean lots of people would get stuck on months old packages? A: The registry would only get stale if they never choose to run
Pkg.update
for any project. And if they never do so, it's probably safe to assume they are fine running an old environment. Furthermore, we should in general expect many users explicitly want to run years old environments that they choose not to upgrade or modify.