Closed sloede closed 11 months ago
Right, that makes sense and that was an oversight when support was initially added.
It looks like if we want to do this right, we'd need to implement tree hash computation from scratch, as I don't think the Git CLI provides a way to do so.
It looks like if we want to do this right, we'd need to implement tree hash computation from scratch
I am not sure that this is a solution though:
The problem is in _filter_map_versions
,
https://github.com/JuliaRegistries/TagBot/blob/7ac587806eabe2831c819161afa2eb81834c9c43/tagbot/action/repo.py#L311
which is called from new_versions
and which takes a dict of "version" -> "tree hash" with all Julia package versions that have been registered within the last lookback
days. _filter_map_versions
is then supposed to convert the tree hash to the corresponding commit hash.
But that's where I see the issue that we cannot easily work around: This conversion only works if there is a unique tree hash -> commit hash mapping. However, for a subdirectory package, the tree hash of the subdirectory does not change if the subdirectory contents are left untouched. That is, there are potentially a lot of git commits that have a common tree hash for (some of) their subdirectories.
To summarize, unless I am missing something (which might very well be the case), there is just no way to uniquely identify a git commit through a tree hash of a subdirectory.
Therefore, I think to retain the ability for TagBot to tag subdirectory packages (actually: to actually make it work in the first place), the whole version identification mechanism has to be rethought (and reimplemented). Since the Julia registry does not record commit hashes for its versions in the repository (only tree hashes), literally the only place where we have a unique connection between a Julia package version and its commit hash is currently the JuliaRegistrator
PR.
Thus, we would either need to convince the Julia registry to start recording commit hashes for new package versions together with the tree hash. Alternatively, we could limit the subdirectory tagging (or TagBot in its entirety) to work only with versions that have been registered through the JuliaRegistrator, which creates the PRs with the commit hash information in an expected format.
To double check: is this check a consistency check (verify TagBot is running on the right commit, which it should be doing bc of some other mechanism), OR is this to figure out what commit to run on at all (which commit to tag)?
If it is a consistency check, then for subdirs my thought would be: as long as the commit in question indeed contains the right subdir hash (even if there are other commits that also contain that subdir), then we are consistent and the check can pass.
is this check a consistency check (verify TagBot is running on the right commit, which it should be doing bc of some other mechanism), OR is this to figure out what commit to run on at all (which commit to tag)?
In general, a little bit of both. In _filter_map_versions
,
https://github.com/JuliaRegistries/TagBot/blob/7ac587806eabe2831c819161afa2eb81834c9c43/tagbot/action/repo.py#L311-L334
TagBot first tries to get the commit hash from the registry PR using _commit_sha_from_registry_pr
in line 316. It currently fails here because of the tree hash mismatch (as described above), but otherwise would be able to get the commit hash we need.
However, if the PR approach fails, TagBot falls back to using _commit_sha_of_tree
in line 318. This actually tries to get the commit hash by comparing the tree hash against the tree hashes of previous commits. It is here where only using a tree hash to find the corresponding commit fundamentally fails, since it is not a unique relationship anymore.
However, we could follow your idea,
as long as the commit in question indeed contains the right subdir hash (even if there are other commits that also contain that subdir), then we are consistent and the check can pass.
and do the following:
If a relevant PR was found and and a commit hash was successfully extracted and subdir
is non-empty, we compare the subdir
tree hash for the found commit hash against the package version tree hash. If they match, we can be reasonably sure that everything is OK and proceed with tagging.
This would make the subdir
feature more restrictive (it would only work for JuliaRegistrator-triggered version registrations, not for manual PRs), but that should cover >95% (my personal estimate) of all use cases in practice. The only remaining difficulty would be here to actually obtain the tree hash, but this could probably be done through _git
.
I've created https://github.com/JuliaRegistries/TagBot/pull/282 which implements the suggested solution.
TL;DR
If I am not mistaken, the current implementation for supporting subdirectory tagging is broken due to an apple-with-oranges-style comparison of the commit tree hash and the package tree hash, which are not identical for subdirectory tags.
Long version
The setup
After #279 had been merged, we tried to use TagBot to tag a new release of LibTrixi.jl, which lives in the
LibTrixi.jl
subdirectory of https://github.com/trixi-framework/libtrixi. Correspondingly, we have configured TagBot withsubdir: LibTrixi.jl
.The error message
However, during its run, TagBot quit early with the following messages:
(see this log message).
The origin
The "Tree SHA ... does not match" error comes from the following snippet: https://github.com/JuliaRegistries/TagBot/blob/7ac587806eabe2831c819161afa2eb81834c9c43/tagbot/action/repo.py#L258-L263 where
m[1]
holds the first 32 characters of the commit hash that has been extracted from the PR body of the corresponding PR to https://github.com/JuliaRegistries/General. In our case, it got the commit hash6e615c91364595e8ad1feca67ebfbb39
from https://github.com/JuliaRegistries/General/pull/91262#issue-1892562735.As an additional safety measure (or maybe it is required to catch ambiguities, but I cannot see where they might come from), in line 259 the tree hash of the registered commit (
commit.commit.tree.sha
) is compared against the tree hash stored in the Julia registry for the corresponding package version. In our case, the version isv0.1.2
, and TagBot extracted the package tree hash7911434c8b808550e3bd6c321dd50da91b128a9e
from this file.The bug (?)
And this is where the problem is located: The package tree hash
7911...
is obtained by computing the git tree hash of the package directory, which in our case is the subdirectoryLibTrixi.jl
. The commit tree hash, against it is compared here, however, is computed on the entire repository.To quickly verify that they are different, you can check the different strings from the libtrixi repo:
which will output something like
Obviously, this comparison of the commit tree hash and the package tree hash is bound to fail whenever the package lives in a subdirectory of the repository.
The solution
Currently, I see no obvious way out of this: Relying on the tree hash to determine the appropriate commit hash works only if the tree hash was computed for the entire repository, as otherwise there is no unique tree hash -> commit hash relation anymore. Since we are not interested in the content here (in which case that would be fine) but in the unique commit, this mechanism does fundamentally not work for subdirectory tags.
The only viable option I see atm is to remove the fallback mechanism for the commit identification and solely rely on the registry PR information. If this is not an option, one could consider doing this only for subdirectory tags and to leave the existing machinery in place for full-repo tags. Removing the (currently usuable) subdirectory support is imho not an option.
What are your thoughts on this @christopher-dG @IanButterworth (and maybe @hannahilea)?
cc @bgeihe