Signature Format - Githubissues

git-wotr / spec

Securely review git commits

Other

3 stars 3 forks source link

Signature Format #4

Open lrvick opened 6 years ago

lrvick commented 6 years ago

I experimented with the following format today as a single json line incorperating ideas from @oxij with mine. Note this example also assumes potential use cases like outlined in #2

Will update the following example as we reach agreement on format (or any "Git romans" style alternative format we want to compare against)

In case it is not obvious, the "sig" value is valid against a one line representation of anything in the "body" object.

This example is also intended to represent all options we might want to support so we can decide on a sane future-proof format. Any options here are not implied to be included in a v1 spec.

{
  "body":{
    "commit":"d27e15a09812241167a61c3222cb6e97c5b0d6b0",
    "tree":"58215f173d6ef6892a07c9073d55145f387d1968",
    "patch":"068a37e54586de8339f13ec980d31f6c30b6f6e7",
    "date": "2018-10-05T03:01:46Z",
    "review": {
      "context-understanding": "author",
      "diff-understanding": "high",
      "thoroughness":"medium",
      "result":"+"
    },
    "artifacts": {
      "out.tar.gz":"2ad6f470b5398251018a4c25f5eb35686481681f"
    }
  },
"sig":"iQJFBAABCAAvFiEEZ1U/vaRrtxq9LgsLjkeh7DWhVR0FAlu2h3IRHGxhbmNlQGxydmljay5uZXQACgkQjkeh7DWhVR0K9Q//T+UleZFWlbiIFoUBp1hX0xzg/eEaTFnnlCdWWaa5f1E+TI80Pg/wXhpEUd98ghZThXwaWJwnPp34BpZ55GU5Qr1cXjY00Yt7I0p3a5TUsdk5FP8tiUWNQ9kA6npUOIVixa8HFkhz0SvHvXNAvWsasqw6nb+SNfA+aQYIP/wAY71ZpMLkJsMQssaFWsjL/hsSmOpi7VyLEFEuUmEM1CA6VPiMvp4rvP75Bt4DnEouyFPxk4WbVIqX4DIeG8+5W8jQCLlxV6HDH4g+POdZbD7/Yg6XjDKEfD2vM/OYrBczPZX0Tu1WhVyYOtQ/WsAr9wzZjgl50/9UkP6E4pt9ft42jdNhGcO1sEwzlC5W1efC2izjL4medEiLe00zIM0L42X10HebH22j4gKyb4EErSHo0H5eLisf1xAwAHiOE2wJou1SR5dVmKydXicVN3wuLqhzc52awmoOaEts1Q3zFyB8MfDIi39CACwAvOTrKw54raoGgzxDEFEP3sKWF1FuZQJreG1Ufg57Oxv5f1lOqvaoMN7ZNjcQkPUAqj7G7e/fwyXGCG2mvnz++D5/2JAMeeO4k8iwxGXEsF0qD3QveEZPFoIpPZX+uOGqM9iIPDJd9bGu0xI5yWeqLpbdXdm4ClsiRWOmavUvjm8RK99o7I9EvctCzrIrGnwlySZ0+Po856I="
}

Ekleog commented 6 years ago

@oxij Just noticed that the current spec doesn't include the question of where the signature is to be put is not included in the current spec. Do you have an opinion about it? Git does gpgsig [gpg signature with newlines included, one space at the beginning of each new line] in the metadata fields, but I'm not sure it'd be really easy to parse? Sounds like the least bad option anyway, though.

@lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :)

lrvick commented 6 years ago

@lrvick I think the argument in favor of keeping a format similar to what git does is, it'd make integration into git upstream much easier. If it's using JSON, I'd be willing to bet Junio Hamano would laugh at us. Now… I can only bet :)

If something to address this problem was included in git-core upstream then the advantages of JSON for easy parsing by third party verification tools go out the window, as well as the need for versioning.... because git is the authoritative tool. There is also no reason it would use the git-notes interface at all as there could be a first class object for signatures. If such a thing were to happen it would deprecate all our current work as prior art that informed design. In the mean time I think we have to use the most practical tools we can reach for going with the assumption that we need easy third party code support for now.

By the way I took a look at https://github.com/google/git-appraise and they look like they have landed on very similar directions (also pretty amazing project for decentralized code review where the central VCS interface is not trusted.).

Ekleog commented 6 years ago

Well… if this problem is addressed in git upstream, then hopefully they'd reuse the same data model as us, so that previous signatures would still be valid with the git upstream verifier without having to re-sign everything.

Which makes me think, ISTR at some point you said you had been with a member of the git development team who said you should pursue integration in git upstream for reviews… can you invite them to this discussion? Hopefully they'd have hints on how to do things so they will be more easily upstreamed :)

oxij commented 6 years ago

Concerning versioning, I think version should be per-branch, not per-review. Concerning JSON and third-party verification tools, I think we can always add JSON output on top.

I think git-wotr should be the minimal viable spec. E.g. reading discussions in the issues here I'm inclined to just drop every metadata field except "result" and "result-otherwise" from the top-level spec since only those, I think, should be used to make decisions in the general case. We can then allow chaining of formats. E.g.

commit <commit-id> diff
with <commit-id>
result +
result-otherwise !
content-type review

thoroughness high
diff-understanding high
content-type git-signatures/artifacts.json

<JSON you want>

commit <commit-id> diff
with <commit-id>
result +
result-otherwise !
content-type comment # the default

<comment>

oxij commented 6 years ago

In particular, with the above you can pack https://github.com/google/git-appraise reviews into this format pretty much verbatim (they have no equivalent of "result" field in the comment schema, so just set it to "0").

(Well, at least we are not arguing whenever we pack XML into there. Good.)

lrvick commented 6 years ago

@oxij So multiline gets much harder to parse, but this did just give me an idea of maybe a happy medium...

What if the signature was just our most simple "git romans" format in one line. Tools could use this alone as an "I approve this commit" and need not worry about any metadata (the most typical use case I imagine).

We could however add at the end a key of a hash of all the metadata, which can be found at refs/notes/signatures/

We could end up with a format with only our highest level keys to anchor things.

Namely:

<commit-hash> <patch-id> <metadata-hash>

...where metadata hash and patch-id are optional

From there we can use content-type delimited blobs exactly as you propose in the "attached" note and any tool can include anything they feel valuable. Have cake and eat it too?

oxij commented 6 years ago

How are multiple lines are harder to parse than a single line? It's just a sequence of chars.

What is hard to parse is escaping (like JSON has), with escaping you need an actual parser, with git-object-like format you only need to match sequences of chars, which can be done with simple regular expressions (note that the grammar is design.org is regular, this is not an accident), if not simpler.

How would you do with in one-line format?

git-object-like format still allows extensions when combined with https://github.com/git-wotr/spec/pull/10#issuecomment-427919142

lrvick commented 6 years ago

Correction: they are harder to merge. The setup I proposed would never have a merge conflict.

So far I have dodged this problem by just using one-line json blobs, but what I proposed above odges this by delegating all metadata to optional attachments in separate "files" in the git notes tree so there is never a clash between one signer and another.

oxij commented 6 years ago

",,"

Again, the problem with just this much is that you can't verify without looking at "with" and "result" metadata fields.

However, I think, allowing "content" to be stored separately and referenced by hash is ok.

Ekleog commented 6 years ago

@oxij

Concerning versioning, I think version should be per-branch, not per-review.

~Issue with this is, an attacker can cherry-pick a review from another version branch and re-use it.~

Strike that, it's handled by having all commits on the refs/reviews branch being signed. Also, anyway the ideas on #10 leave the path open for a future version attribute.

I think git-wotr should be the minimal viable spec. E.g. reading discussions in the issues here I'm inclined to just drop every metadata field except "result" and "result-otherwise" from the top-level spec since only those, I think, should be used to make decisions in the general case.

Well, I guess we totally agree about the fact that this should be the minimal viable spec, but I'm not sure whether other fields should be used or not in determining commit trustworthiness. I guess it's a matter for #9, though :)

We can then allow chaining of formats

This sounds like a good idea iff most tools can be done without looking at anything else than a single layer. IOW, if concerns are actually decoupled.

Also, I fear a bit this direction: then, it's only a step away from multipart/mixed, and I feel like that's something we should try to avoid if we don't want the thing to become a behemoth of complexity. I'd be more comfortable saying people who really want to store additional information should do so in the commit message / x-* headers and have their tooling on the side.

@lrvick TBH I'm not sure about the issue of one-line vs. multiple-lines. The advantage of one-line JSON is that it can be parsed and verified as is with a dependency on a JSON parser.

But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line.

With that in mind, I don't really see a reason not to have the content in the same object as the signature: it'd just make things less nice to debug.

Ekleog commented 6 years ago

Whoops, comment went out too fast. First remark now reads:

~Issue with this is, an attacker can cherry-pick a review from another version branch and re-use it.~

Strike that, it's handled by having all commits on the refs/reviews branch being signed. Also, anyway the ideas on #10 leave the path open for a future version attribute.

oxij commented 6 years ago

But having multiple lines isn't that big a deal: we can just base64 it all and put it all on one line.

I'm also for keeping the git-signatures's current approach here. IMHO no need to invent a new signature format when we can just reuse OpenPGP and base64 encode its packets.

If we really want we can also skip the base64 step as OpenPGP packets have clear boundaries and those can be easily parsed with sane languages, but I'm all for keeping it as it makes a simple bash implementations viable too.

multipart MIME

I agree that "multipart" part is not the direction we should be going, but the content-type field itself seems pretty useful, IMHO.

My main problem with the whole issue is that by adding each new field on the top level we add assumptions and restrictions, and I want "tools, not restrictions".

E.g. I imagine some projects might want both diff- and context-understanding to be a single attribute. If git-appraise and crev can't agree on anything in their data models who are we to force them? I imagine some might even disagree with !-0+ of the result* fields (which kinda depresses me).

E.g. I myself now think that maybe we should also have an inverse to ! to mark security fixes and ? to explicitly say "should be reviewed by someone else" (aka review request in git-appraise). I.e. something like

"result":

- "!" = "has a security issue!",
- "-" = "I disapprove",
- "0" = "FYI",
- "?" = "No idea/please review",
- "+" = "I approve",
- "=" = "has a fix for a security issue",

Thinking some more about this, we might also want to have "FYI" reviews to be their own separate thing. I.e.

"result":

- "!" = "has a security issue!",
- "-" = "I disapprove",
- "?" = "No idea/please review",
- "+" = "I approve",
- "=" = "has a fix for a security issue".

and "FYI" reviews would then have no result* fields at all and would be completely ignored by verify operation but still shown by show. I.e. you could then use them as usual comments (storing them in a separate branch with relaxed constraints might be good then too).

I have several more similar ideas.

So the question is, where do we stop here?

This explains my slowness towards approving #10. I'm not even convinced myself we should allow x- fields in the first place. Maybe we should set the very minimal format (including using only a single bit of information "vulnerable/maybe not" in result fields) in stone and put everything else into content-type extensions. Maybe not. I spent a week thinking about it I'm still not convinced either way.

Ekleog commented 6 years ago

If we really want we can also skip the base64 step as OpenPGP packets have clear boundaries and those can be easily parsed with sane languages, but I'm all for keeping it as it makes a simple bash implementations viable too.

Same here, especially as it leaves the door open to a sorting strategy like cat_sort_uniq, should we eventually find it useful.

My main problem with the whole issue is that by adding each new field on the top level we add assumptions and restrictions, and I want "tools, not restrictions".

Well, that's true… but at the same time we do need to standardize on something if we want tools to be interoperable. So at least the result field is strictly needed.

E.g. I imagine some projects might want both diff- and context-understanding to be a single attribute.

For instance, this problem can be solved by just having the tool present a single option and writing equal values to diff- and context-understanding under the hood. Now, yes, existence of these attributes can be discussed.

E.g. I myself now think that maybe we should also have an inverse to ! to mark security fixes and ? to explicitly say "should be reviewed by someone else" (aka review request in git-appraise). I.e. something like

For security fixes, this can be found by listing the with clauses of result + result-otherwise ! commits. Also… well, I don't see why a security-fix review should be handled differently from a positive review, so IMO this should at most be a different key.

For ?, same as above: it doesn't bring any meaningful information to #9, so should not go into result IMO.

However, the 0 already present is actually required, eg. to handle my reply to 7c6f434c in https://github.com/NixOS/rfcs/pull/34#issuecomment-429515327: handling would be with result 0 result-with + and with all the commits of the PR, if one only checked the outcome of the PR. (note: this paragraph may depend on the outcome of #14, if a commit-diff signs the diff to the first parent only then it isn't required)

and "FYI" reviews would then have no result* fields at all and would be completely ignored by verify operation but still shown by show. I.e. you could then use them as usual comments (storing them in a separate branch with relaxed constraints might be good then too).

That… would sound just like git-notes, actually. Not sure we actually have a need for this, IMO we're going too far in the “encompass all the possible use cases” here, and we'll end up rewriting git if we continue this direction :)

My vision of our need is a system for signing code review of diffs in a context in a way for which verification can be automated. In this context, security fix handling should be considered only insofar as it helps automated verification of code review. Additional metadata can be considered as additional metadata, that need not be in the standard -- and particularly not in the review field IMO.

tl;dr: I think we need at least !-0+ for review{,-otherwise}, and with for commits. I think all the rest could go into implementation-defined extensions indeed (which could be content-type, x-* fields, etc.)

oxij commented 6 years ago

tl;dr: I think we need at least !-0+ for review{,-otherwise}, and with for commits. I think all the rest could go into implementation-defined extensions indeed (which could be content-type, x-* fields, etc.)

I see. I'm leaning towards with, result*, content-type fields, and all unknown fields being errors. This has an advantage of allowing structured formats like JSON in content-type if and when you need them.

However, this raises two issues:

in https://github.com/git-wotr/spec/issues/9#issuecomment-428047874 you write

Also, I'm not sure what the point of the low settings to thoroughness, context-understanding and diff-understanding is, then, if not taking them into account for review checking: if I set the three to low, then I'd hope people trust (by default) my review less than when I set all of them to high, because otherwise it means I'll just not sign anything with low understanding.

which also makes sense.

But maybe those extensions should then be processed via hooks that would classify reviews into categories which you would then process like you propose in https://github.com/git-wotr/spec/issues/9#issuecomment-429413143. E.g. you would then assign "+" without high-enough thoroughness to a lowest category or something. Or you could discount "+"es by authors that seem to miss many issues in their reviews. How does that sound?

The idea behind "?" still seems useful to me: a kind of review type for getting attention. I agree that maybe it shouldn't be a result, but if not a result, then what? How about something like

commit ... diff
...
result !
content-type: review

subsystem: nixos/ssh
understanding: high
thoroughness: low
review-request: @<ssh service maintainer>

This seems very bad to me in such and such use case. @<ssh service maintainer> should take a look ASAP.

commit ...
result +
content-type: review

subsystem: pkgs/<package>
understanding: medium
thoroughness: medium
review-request: @<package maintainer>

LGTM, evaluates ok, but this is not a trivial change and it would be nice if @<package maintainer> would also take a look.

commit ...
result +
content-type: review

subsystem: nixos/new-service
review-request: * # to anyone willing

Please review this new service!

Ekleog commented 6 years ago

But maybe those extensions should then be processed via hooks that would classify reviews into categories which you would then process like you propose in #9 (comment). E.g. you would then assign "+" without high-enough thoroughness to a lowest category or something. Or you could discount "+"es by authors that seem to miss many issues in their reviews. How does that sound?

That makes sense :) overall, I think these can be tool-specific features, and I more and more think that if a way to define policy should be given by the RFC for projects to give their policies, tools should be able to add in tool-specific features that take in addition some additional data in the reviews.

The idea behind "?" still seems useful to me: a kind of review type for getting attention. I agree that maybe it shouldn't be a result, but if not a result, then what? How about something like

Your examples LGTM… except that I do fear content-type, among other reasons because then we'd need the RFC to actually standardize its possible contents (or leave that to MIME, which would be really meh).

dpc commented 5 years ago

Hi Lance, it's Dawid! @lrvick . I'm working on something different, but similar: cargo-crev - which is a tool for reviewing Rust/crates.io packages, and a part of broader ecosystem of code-trust ideas (crev). @Ekleog pinged me to come over.

So I've been independently creating own format for crev, and surprisingly came to something really similar. You can take a look: https://github.com/dpc/crev-proofs/blob/master/_xQgkbDAQx3nSV5SMfdEeQBSYiPwSI32wnMxnjExk24%3D/reviews/2018-12-packages.proof.crev

Regarding the cosmetic differences:

I'm using Yaml, where I've stripped leading ---\n. It's human readable, and simple enough that can be parsed even with a grep if need be, but most tools should be able to take a standard yaml library. Or just use Rust library that implements it.
I come from a point of view, that the more fields you have, the less chance people will bother filling them carefully (or at all). Because of that, I think one understanding is enough.
In crev result is a recommendation that can be one of strong, positive, neutral, negative and dangerous. I don't understand why would you want to use + and other cryptic values. Is being self-explanatory, and easy for the eye not a good thing?

I have very shallow understanding of scope and goals here, so please excuse if I get anything wrong, but some differences from my impression so far (the goals of projects might be different, so I'm just staying them, without much opinion):

"Proofs" (signed document in crev) are just text objects and are transport-independent. I envision they are going to be mostly circulated out-of-band, via syncing of personal git repositories, but that could change. Tying everything too much with git is IMO, a mistake. git is most popular revision system right now, but not everyone is using it right now, and in the future we might use something else entirely. (like pijul).

I have a big chunk of functionality ready for cargo-crev, so you might want to give it a try for inspiration, or just to give me feedback.

oxij commented 5 years ago

This spec is not really specific to git. It does assume some kind of content-addressable storage (and a Merkle tree for sequences of commits if we are to ignore alternative anchoring modes of #3) but nothing beyond that.

From my point of view, the main issue here is making the spec robust against evil participants and servers without requiring too much review work from the honest participants.