Open lrvick opened 6 years ago
@oxij Just noticed that the current spec doesn't include the question of where the signature is to be put is not included in the current spec. Do you have an opinion about it? Git does gpgsig [gpg signature with newlines included, one space at the beginning of each new line]
in the metadata fields, but I'm not sure it'd be really easy to parse? Sounds like the least bad option anyway, though.
@lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :)
@lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :)
If something to address this problem was included in git-core upstream then the advantages of JSON for easy parsing by third party verification tools go out the window, as well as the need for versioning.... because git is the authoritative tool. There is also no reason it would use the git-notes interface at all as there could be a first class object for signatures. If such a thing were to happen it would deprecate all our current work as prior art that informed design. In the mean time I think we have to use the most practical tools we can reach for going with the assumption that we need easy third party code support for now.
By the way I took a look at https://github.com/google/git-appraise and they look like they have landed on very similar directions (also pretty amazing project for decentralized code review where the central VCS interface is not trusted.).
Well… if this problem is addressed in git upstream, then hopefully they'd reuse the same data model as us, so that previous signatures would still be valid with the git upstream verifier without having to re-sign everything.
Which makes me think, ISTR at some point you said you had been with a member of the git development team who said you should pursue integration in git upstream for reviews… can you invite them to this discussion? Hopefully they'd have hints on how to do things so they will be more easily upstreamed :)
Concerning versioning, I think version should be per-branch, not per-review. Concerning JSON and third-party verification tools, I think we can always add JSON output on top.
I think git-wotr
should be the minimal viable spec. E.g. reading discussions in the issues here I'm inclined to just drop every metadata field except "result" and "result-otherwise" from the top-level spec since only those, I think, should be used to make decisions in the general case. We can then allow chaining of formats. E.g.
commit <commit-id> diff
with <commit-id>
result +
result-otherwise !
content-type review
thoroughness high
diff-understanding high
content-type git-signatures/artifacts.json
<JSON you want>
commit <commit-id> diff
with <commit-id>
result +
result-otherwise !
content-type comment # the default
<comment>
?
In particular, with the above you can pack https://github.com/google/git-appraise reviews into this format pretty much verbatim (they have no equivalent of "result" field in the comment schema, so just set it to "0").
(Well, at least we are not arguing whenever we pack XML into there. Good.)
@oxij So multiline gets much harder to parse, but this did just give me an idea of maybe a happy medium...
What if the signature was just our most simple "git romans" format in one line. Tools could use this alone as an "I approve this commit" and need not worry about any metadata (the most typical use case I imagine).
We could however add at the end a key of a hash of all the metadata, which can be found at refs/notes/signatures/
We could end up with a format with only our highest level keys to anchor things.
Namely:
<commit-hash> <patch-id> <metadata-hash>
...where metadata hash and patch-id are optional
From there we can use content-type delimited blobs exactly as you propose in the "attached" note and any tool can include anything they feel valuable. Have cake and eat it too?
How are multiple lines are harder to parse than a single line? It's just a sequence of chars.
What is hard to parse is escaping (like JSON has), with escaping you need an actual parser, with git-object-like format you only need to match sequences of chars, which can be done with simple regular expressions (note that the grammar is design.org
is regular, this is not an accident), if not simpler.
How would you do with
in one-line format?
git-object-like format still allows extensions when combined with https://github.com/git-wotr/spec/pull/10#issuecomment-427919142
Correction: they are harder to merge. The setup I proposed would never have a merge conflict.
So far I have dodged this problem by just using one-line json blobs, but what I proposed above odges this by delegating all metadata to optional attachments in separate "files" in the git notes tree so there is never a clash between one signer and another.
"
, , "
Again, the problem with just this much is that you can't verify
without looking at "with" and "result" metadata fields.
However, I think, allowing "content" to be stored separately and referenced by hash is ok.
@oxij
Concerning versioning, I think version should be per-branch, not per-review.
~Issue with this is, an attacker can cherry-pick a review from another version branch and re-use it.~
Strike that, it's handled by having all commits on the refs/reviews
branch being signed. Also, anyway the ideas on #10 leave the path open for a future version
attribute.
I think git-wotr should be the minimal viable spec. E.g. reading discussions in the issues here I'm inclined to just drop every metadata field except "result" and "result-otherwise" from the top-level spec since only those, I think, should be used to make decisions in the general case.
Well, I guess we totally agree about the fact that this should be the minimal viable spec, but I'm not sure whether other fields should be used or not in determining commit trustworthiness. I guess it's a matter for #9, though :)
We can then allow chaining of formats
This sounds like a good idea iff most tools can be done without looking at anything else than a single layer. IOW, if concerns are actually decoupled.
Also, I fear a bit this direction: then, it's only a step away from multipart/mixed, and I feel like that's something we should try to avoid if we don't want the thing to become a behemoth of complexity. I'd be more comfortable saying people who really want to store additional information should do so in the commit message / x-* headers and have their tooling on the side.
@lrvick TBH I'm not sure about the issue of one-line vs. multiple-lines. The advantage of one-line JSON is that it can be parsed and verified as is with a dependency on a JSON parser.
But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line.
With that in mind, I don't really see a reason not to have the content in the same object as the signature: it'd just make things less nice to debug.
Whoops, comment went out too fast. First remark now reads:
~Issue with this is, an attacker can cherry-pick a review from another version branch and re-use it.~
Strike that, it's handled by having all commits on the
refs/reviews
branch being signed. Also, anyway the ideas on #10 leave the path open for a futureversion
attribute.
But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line.
I'm also for keeping the git-signatures
's current approach here. IMHO no need to invent a new signature format when we can just reuse OpenPGP and base64 encode its packets.
If we really want we can also skip the base64 step as OpenPGP packets have clear boundaries and those can be easily parsed with sane languages, but I'm all for keeping it as it makes a simple bash
implementations viable too.
multipart MIME
I agree that "multipart" part is not the direction we should be going, but the content-type
field itself seems pretty useful, IMHO.
My main problem with the whole issue is that by adding each new field on the top level we add assumptions and restrictions, and I want "tools, not restrictions".
E.g. I imagine some projects might want both diff-
and context-understanding
to be a single attribute. If git-appraise
and crev
can't agree on anything in their data models who are we to force them? I imagine some might even disagree with !-0+
of the result*
fields (which kinda depresses me).
E.g. I myself now think that maybe we should also have an inverse to !
to mark security fixes and ?
to explicitly say "should be reviewed by someone else" (aka review request in git-appraise
). I.e. something like
"result":
- "!" = "has a security issue!",
- "-" = "I disapprove",
- "0" = "FYI",
- "?" = "No idea/please review",
- "+" = "I approve",
- "=" = "has a fix for a security issue",
Thinking some more about this, we might also want to have "FYI" reviews to be their own separate thing. I.e.
"result":
- "!" = "has a security issue!",
- "-" = "I disapprove",
- "?" = "No idea/please review",
- "+" = "I approve",
- "=" = "has a fix for a security issue".
and "FYI" reviews would then have no result*
fields at all and would be completely ignored by verify
operation but still shown by show
. I.e. you could then use them as usual comments (storing them in a separate branch with relaxed constraints might be good then too).
I have several more similar ideas.
So the question is, where do we stop here?
This explains my slowness towards approving #10. I'm not even convinced myself we should allow x-
fields in the first place. Maybe we should set the very minimal format (including using only a single bit of information "vulnerable/maybe not" in result
fields) in stone and put everything else into content-type
extensions. Maybe not. I spent a week thinking about it I'm still not convinced either way.
If we really want we can also skip the base64 step as OpenPGP packets have clear boundaries and those can be easily parsed with sane languages, but I'm all for keeping it as it makes a simple
bash
implementations viable too.
Same here, especially as it leaves the door open to a sorting strategy like cat_sort_uniq
, should we eventually find it useful.
My main problem with the whole issue is that by adding each new field on the top level we add assumptions and restrictions, and I want "tools, not restrictions".
Well, that's true… but at the same time we do need to standardize on something if we want tools to be interoperable. So at least the result
field is strictly needed.
E.g. I imagine some projects might want both
diff-
andcontext-understanding
to be a single attribute.
For instance, this problem can be solved by just having the tool present a single option and writing equal values to diff-
and context-understanding
under the hood. Now, yes, existence of these attributes can be discussed.
E.g. I myself now think that maybe we should also have an inverse to
!
to mark security fixes and?
to explicitly say "should be reviewed by someone else" (aka review request ingit-appraise
). I.e. something like
For security fixes, this can be found by listing the with
clauses of result + result-otherwise !
commits. Also… well, I don't see why a security-fix review should be handled differently from a positive review, so IMO this should at most be a different key.
For ?
, same as above: it doesn't bring any meaningful information to #9, so should not go into result
IMO.
However, the 0
already present is actually required, eg. to handle my reply to 7c6f434c in https://github.com/NixOS/rfcs/pull/34#issuecomment-429515327: handling would be with result 0 result-with +
and with
all the commits of the PR, if one only checked the outcome of the PR. (note: this paragraph may depend on the outcome of #14, if a commit-diff signs the diff to the first parent only then it isn't required)
and "FYI" reviews would then have no
result*
fields at all and would be completely ignored byverify
operation but still shown byshow
. I.e. you could then use them as usual comments (storing them in a separate branch with relaxed constraints might be good then too).
That… would sound just like git-notes, actually. Not sure we actually have a need for this, IMO we're going too far in the “encompass all the possible use cases” here, and we'll end up rewriting git if we continue this direction :)
My vision of our need is a system for signing code review of diffs in a context in a way for which verification can be automated. In this context, security fix handling should be considered only insofar as it helps automated verification of code review. Additional metadata can be considered as additional metadata, that need not be in the standard -- and particularly not in the review
field IMO.
tl;dr: I think we need at least !-0+
for review{,-otherwise}
, and with
for commits. I think all the rest could go into implementation-defined extensions indeed (which could be content-type
, x-*
fields, etc.)
tl;dr: I think we need at least
!-0+
forreview{,-otherwise}
, andwith
for commits. I think all the rest could go into implementation-defined extensions indeed (which could becontent-type
,x-*
fields, etc.)
I see. I'm leaning towards with
, result*
, content-type
fields, and all unknown fields being errors. This has an advantage of allowing structured formats like JSON in content-type
if and when you need them.
However, this raises two issues:
Also, I'm not sure what the point of the
low
settings tothoroughness
,context-understanding
anddiff-understanding
is, then, if not taking them into account for review checking: if I set the three tolow
, then I'd hope people trust (by default) my review less than when I set all of them tohigh
, because otherwise it means I'll just not sign anything withlow
understanding.
which also makes sense.
But maybe those extensions should then be processed via hooks that would classify reviews into categories which you would then process like you propose in https://github.com/git-wotr/spec/issues/9#issuecomment-429413143. E.g. you would then assign "+" without high-enough thoroughness
to a lowest category or something. Or you could discount "+"es by authors that seem to miss many issues in their reviews. How does that sound?
result
, but if not a result, then what? How about something likecommit ... diff
...
result !
content-type: review
subsystem: nixos/ssh
understanding: high
thoroughness: low
review-request: @<ssh service maintainer>
This seems very bad to me in such and such use case. @<ssh service maintainer> should take a look ASAP.
commit ...
result +
content-type: review
subsystem: pkgs/<package>
understanding: medium
thoroughness: medium
review-request: @<package maintainer>
LGTM, evaluates ok, but this is not a trivial change and it would be nice if @<package maintainer> would also take a look.
commit ...
result +
content-type: review
subsystem: nixos/new-service
review-request: * # to anyone willing
Please review this new service!
?
But maybe those extensions should then be processed via hooks that would classify reviews into categories which you would then process like you propose in #9 (comment). E.g. you would then assign "+" without high-enough
thoroughness
to a lowest category or something. Or you could discount "+"es by authors that seem to miss many issues in their reviews. How does that sound?
That makes sense :) overall, I think these can be tool-specific features, and I more and more think that if a way to define policy should be given by the RFC for projects to give their policies, tools should be able to add in tool-specific features that take in addition some additional data in the reviews.
- The idea behind "?" still seems useful to me: a kind of review type for getting attention. I agree that maybe it shouldn't be a
result
, but if not a result, then what? How about something like
Your examples LGTM… except that I do fear content-type
, among other reasons because then we'd need the RFC to actually standardize its possible contents (or leave that to MIME, which would be really meh).
Hi Lance, it's Dawid! @lrvick . I'm working on something different, but similar: cargo-crev - which is a tool for reviewing Rust/crates.io packages, and a part of broader ecosystem of code-trust ideas (crev
). @Ekleog pinged me to come over.
So I've been independently creating own format for crev
, and surprisingly came to something really similar. You can take a look: https://github.com/dpc/crev-proofs/blob/master/_xQgkbDAQx3nSV5SMfdEeQBSYiPwSI32wnMxnjExk24%3D/reviews/2018-12-packages.proof.crev
Regarding the cosmetic differences:
---\n
. It's human readable, and simple enough that can be parsed even with a grep
if need be, but most tools should be able to take a standard yaml library. Or just use Rust library that implements it.understanding
is enough.crev
result
is a recommendation
that can be one of strong
, positive
, neutral
, negative
and dangerous
. I don't understand why would you want to use +
and other cryptic values. Is being self-explanatory, and easy for the eye not a good thing?I have very shallow understanding of scope and goals here, so please excuse if I get anything wrong, but some differences from my impression so far (the goals of projects might be different, so I'm just staying them, without much opinion):
crev
) are just text objects and are transport-independent. I envision they are going to be mostly circulated out-of-band, via syncing of personal git repositories, but that could change. Tying everything too much with git
is IMO, a mistake. git
is most popular revision system right now, but not everyone is using it right now, and in the future we might use something else entirely. (like pijul).I have a big chunk of functionality ready for cargo-crev
, so you might want to give it a try for inspiration, or just to give me feedback.
This spec is not really specific to git. It does assume some kind of content-addressable storage (and a Merkle tree for sequences of commits if we are to ignore alternative anchoring modes of #3) but nothing beyond that.
From my point of view, the main issue here is making the spec robust against evil participants and servers without requiring too much review work from the honest participants.
I experimented with the following format today as a single json line incorperating ideas from @oxij with mine. Note this example also assumes potential use cases like outlined in #2
Will update the following example as we reach agreement on format (or any "Git romans" style alternative format we want to compare against)
In case it is not obvious, the "sig" value is valid against a one line representation of anything in the "body" object.
This example is also intended to represent all options we might want to support so we can decide on a sane future-proof format. Any options here are not implied to be included in a v1 spec.