JuliaEcosystem / PackageAnalyzer.jl

https://juliaecosystem.github.io/PackageAnalyzer.jl/dev/
MIT License
59 stars 5 forks source link

Record the tree hash of the analyzed (sub)directory #69

Closed ericphanson closed 2 years ago

ericphanson commented 2 years ago

This adds a tree_hash field to Package which records the analyzed tree hash. My ultimate goal is to analyze a Manifest, and actually analyze the code according to the tree hashes in the manifest. But I think one first step is to just record the version that was actually analyzed.

In general, since we do a shallow clone of the latest version, the tree hash we get won't match the tree hash of a registered version. But if the package has happened to register the latest commit, we do get the correct hash:

using PackageAnalyzer, RegistryInstances
name = "RegistryCI"
pkg = find_package(name)
info = registry_info(pkg)
v = maximum(keys(info.version_info))
hash = bytes2hex(info.version_info[v].git_tree_sha1.bytes)
result = analyze(pkg)
result.tree_hash == hash

I don't want to add this to the tests bc it depends on the state of a random package, but at least it's confirmation that we are measuring the right thing here.

BTW this even works for packages in subdirectories whose trunk has progressed since they were last registered; replace "RegistryCI" with "SnoopCompileCore" to see.

This does introduce a dependence on the internals Pkg.GitTools.tree_hash. However this has at least been unchanged since 1.6. I am thinking perhaps we can pull it out like RegistryInstances.jl if it becomes unstable at some point in the future.