JuliaLang / Pkg.jl

Pkg - Package manager for the Julia programming language
https://pkgdocs.julialang.org
Other
613 stars 251 forks source link

Make tree hash for loaded package accessible #2714

Open philbit opened 2 years ago

philbit commented 2 years ago

Sometimes, you want to be able to tell exactly which version of a package is currently loaded. In principle, the tree hash is all that's needed. It can also easily be converted to a commit hash if a commit with the exact same content exists, etc. However, looking at the current Manifest.toml (where the tree hash is recorded) is not always sufficient, for example when switching between environments.

Is there currently a possibility to obtain the tree hash of a loaded package? Perhaps I just overlooked it. But if there is not, it would be great to have, for example to debug situations where the wrong version was loaded for some reason. Since Pkg computes the tree hash internally anyway for the manifest, I guess it would "just" be a matter of storing this metainformation somewhere.

Currently, I resort to inserting

const TREE_HASH = bytes2hex(Pkg.GitTools.tree_hash(joinpath(splitpath(@__FILE__)[1:end-2]...)))

in the modules where I need this functionality. This records the tree hash during precompilation so I can access it later. Is it possible and would it be worth enabling Pkg to return this information about a package in memory?

KristofferC commented 2 years ago

Since Pkg computes the tree hash internally anyway for the manifest,

Not for packages that are tracked via path ("devved packages").

philbit commented 2 years ago

Ah, my bad, I forgot. But wouldn't it be nice to always have this info handy anyway? In many cases, it would be useful even for devved packages. But if it's much more difficult for those, it would be nice at least for the "easy" case of non-devved packages.

philbit commented 2 years ago

Just out of interest: How is it decided whether a devved package should be recompiled if not via the tree hash? Good old modification dates?

KristofferC commented 2 years ago

The code for it is here: https://github.com/JuliaLang/julia/blob/1bbba21aa258a99d1ecf1168d72d64cb402fd054/base/loading.jl#L1805-L1903

philbit commented 2 years ago

Thanks. Thinking about it more I realized that Pkg might not be the right place for this... I think the information needs to be saved with the precompile cache so it is available even when the module is already precompiled and just loaded (or precompiled but not by Pkg). Alternatively, it could be recomputed when the package is loaded. This should be safe since, at that point, the loaded module is guaranteed to correspond to the files on disc anyway (otherwise it is recompiled). But for huge source trees, recomputing the tree-hash upon loading might slow down loading, so perhaps it would be better to compute it only once upon precompilation.

Side note: Another thing I realized was that, for non-devved packages, this information is already sort of available, just a bit hidden. Since the source code for those is assumed immutable, one could just call Pkg.GitTools.tree_hash(pkgdir(m)) to get the tree hash of the currently loaded module m. Of course this assumes that the assumption of immutability is correct (i.e. that nobody messed with the files in pkgdir(m) between loading and now), which should theoretically always be the case, but if this is also meant as a tool to track down issues when things go wrong, this assumption could easily be one of them. I just wanted to mention this mainly to justify why in the cases where it would be particularly useful (versioning problems, devving, etc) this method does not work.

KristofferC commented 2 years ago

Of course this assumes that the assumption of immutability is correct (i.e. that nobody messed with the files in pkgdir(m)

Unfortunately, people do with build scripts. There are more modern way of doing things (https://github.com/JuliaPackaging/Scratch.jl) but the old way is still here.

philbit commented 2 years ago

I think I've found a way to implement what I am thinking of in an external package that taps into module loading in Base in the same way that Revise.jl does. I'll try it and report back here when I've produced something presentable.

philbit commented 2 years ago

This package is my first attempt at providing this sort of functionality. Most of the things I had in mind when opening this issue can be solved with it. It's currently waiting for registration. Let me know if you have any feedback.