`verify-commit` semantic refinement

flandweber commented 1 month ago

This issue is intended to discuss a possible change in semantics for the verify-commit subcommand.

I'm still a little unclear on the specifics, but from what I understand from talking with @adityasaky it gathers the first reference to a commit ("first mention is RSL") and verifies if that was a valid change of the reference.

@adityasaky now proposed a change the semantics towards verifying a commit with respect to a given reference (e. g. branch).

Example:

state 0:

180757b (feature) feature commit
5fa7115 (HEAD -> main) init

state 1 (main rebase to feature):

180757b (HEAD -> main, feature) feature commit
5fa7115 init

If my understanding of the current implementation is correct, if gittuf verify-commit 180757b was run after the merge (state 1), it would still verify the commiter of 180757b was allowed to commit to feature branch according to the policy at state 0.

The new implementation would require a reference as a second argument. The command gittuf verify-commit 180757b feature could then be used to check if main was valid at state 0 whereas gittuf verify-commit 180757b main would check if the original commit was according to policy.

Are reverts valid gittuf actions i. e. recorded to the rsl just the same? If so, a commit could be added to the rsl, reverted and then readded, which means the combination commit/reference does not uniquely identify a rsl state. How should gittuf behave in such situations? Are there significant drawbacks to simply picking the first/last occurrence?

flandweber commented 1 month ago

Come to think of it, maybe an added flag for verify-ref would be a better match for the proposed semantics:

gittuf verify-ref main --at 180757b

Which would have the additional perk of being able to throw out verify-commit and verify-tag entirely and rename verify-ref to verify.

neilnaveen commented 1 month ago

We would not necessarily be able to throw out verify-tag, since while each tag is a ref itself, tags, and their verification work diffrently compared to regular refs. We maybe could route tag verification through gittuf verify-ref, but I think it would have to be its own feature under the hood.

neilnaveen commented 1 month ago

Also, the use cases of verify commit are minor, and after #337 is merged, parsing invalid states will be much easier. Currently, the only use case of verify-commit(that I can see) is to find out if, in an invalid state, these x commits caused it, especially when we have to take into account the ref that we are verifying the commit with, which would make it take as long as a regular verify-ref.

adityasaky commented 1 month ago

Here's how gittuf verify-commit currently works: it tries to identify when the specified commit was first seen in the repository in any git reference from the RSL. Recall that the RSL is a series of entries, with each indicating a git ref and the target commit or tag object's ID for that ref. To identify when a commit was first seen, the RSL is walked back until an entry is found that doesn't "know" the commit. That is, the commit recorded in the entry is not a descendant of the commit being verified. This is problematic because this entry may have been for a push to an older feature branch, and entries before it may well "know" the commit under test. To be really sure about when we first saw a commit, we would essentially have to walk all the way back to the start of the RSL.

The reason this is done is to identify the policy applicable at the time the commit was first seen. verify-commit uses all the public keys in this policy to verify the commit's signature. Note that this means it doesn't enforce any meaningful policies about whether the signature is indeed valid for a given context (eg: the branch the commit is in), because we don't have that context with verify-commit. This is largely why I think we ought to drop the current verify-commit flow, I also am not a big believer in a boolean signature verification check on a commit that may or may not actually matter to the repository's history.

verify-tag is a bit different because it primarily considers annotated tag objects rather than lightweight tags, and verifies the tag object's signature. However, it also notes that the tag is a ref as well, and so it looks at the corresponding RSL entry. This really means that verify-tag is part of the way to verify-ref minus some checks (eg: bootstrapping successive roots of trust, thresholds etc), so I'm inclined to roll that into verify-ref as well. I have some pending changes to policy's verifyTagEntry that addresses this, which I plan to open after the current sequence of PRs on the git binary backend are merged.

I think I largely agree with @flandweber's point above. We ought to roll things into verify-ref and provide some features for users to do more specific things. For example, we already have verify-ref --from-entry to verify the RSL from a particular entry (I see this being valuable on fetch when the local repository is in a verified state pre-fetch and must only verify new entries, #245). I think verify-ref --at-commit as proposed above is a good idea. We'd be doing a version of what I stated above to determine when the commit first appeared in the specified ref, but it'd be a lot more constrained.

Also related: https://github.com/gittuf/gittuf/pull/287. We punted this when we added --from-entry, and it may be related to the --at feature.

cc @patzielinski @lukpueh as a heads up on this discussion.

flandweber commented 1 month ago

Currently, the only use case of verify-commit(that I can see) is to find out if, in an invalid state, these x commits caused it,

My use case is actually a little different: I'd like to pin a commit in a Nix fetcher (https://codeberg.org/flandweber/fetchgittuf), so that it always evaluates to the same state of the repository. I wouldn't care if a verification later on fails, as long as the commits up to the pinned commit pass.

To be really sure about when we first saw a commit, we would essentially have to walk all the way back to the start of the RSL.

Expensive operations may be acceptable in some cases, though the --at flag can't be more expensive than verify-ref on the branch head, correct? Also could be sped up by using data structures, though I realize this is probably not be worth the effort.

adityasaky commented 1 month ago

Expensive operations may be acceptable in some cases, though the --at flag can't be more expensive than verify-ref on the branch head, correct?

It depends, but it should be possible to make it cheaper I think. I'm also generally curious about what optimizations we can pull off in the rsl package, we can likely track parent child relationships and what not.

gittuf / gittuf

`verify-commit` semantic refinement #384