c2pa-org / public-draft

Repository for the public drafts of the C2PA Specifications
Creative Commons Attribution 4.0 International
35 stars 1 forks source link

Death by flexibility #33

Closed hackerfactor closed 2 years ago

hackerfactor commented 2 years ago

In order to parse and validate the provenance record defined in the current draft, the parser must support a wide range of data formats.

The dependency list includes:

While I understand the desire to have a flexible solution, this becomes overwhelming for a standalone decoder. Any decoder solution will have a large code footprint. This limits the ability to embed a decoder in an embedded or IoT device that has limited resources.

One solution could depend on third-party software for handling the various parsing and algorithmic support. libjson for JSON, libcbor for CBOR, libxml for XML/XMP, openssl for the crypto, etc. All of these dependencies have their own sets of dependencies, leading to a nightmare for SBOM (software bill of materials) management.

Rather than having lots of options that are equivalent, why not make a decision and choose one?

For example, SHA256 is probably good enough for the current document formats. If it's not good enough, then use SHA512. (I don't care which, but make a decision.) The whole argument that one checksum for integrity checks could be better than another will be mute in a few years, when computers get faster and we start using SHA2048 or some other future checksum and need to update the spec anyway. Similarly, someone in the future may figure out how to crack SHA512 before they figure out how to crack SHA256. (Maybe there is some underlying weakness in SHA512 that nobody has discovered yet.) Betting on which will be stronger in the long run is a waste of time unless you can see the future.

By choosing specific options, but permitting the spec to replace them in the future (e.g., everyone use CBOR and SHA256, but it can change in the future), we restrict the complexity, code size, and dependencies. This results in simpler implementations, easier SBOM tracking, and wider adoption.

lrosenthol commented 2 years ago

Your concern is certainly one that the committee itself debated on multiple occasions, and the reasons for some of these decisions have been included in the spec itself. For example, see https://c2pa.org/public-draft/#_use_of_jumbf (10.1) on the use of JUMBF.

Although it's not reflect in the document itself (and we could certainly add it, if you thought it would be helpful to others) is the rationale behind CBOR over JSON. The reasons include size, security, alignment with the IoT and other communities, and COSE over JOSE. We could have mandated that all assertions were CBOR (and it was indeed discussed by the committee), it was felt that keeping existing JSON-based standard to not have to transcode was a more compatible approach for the industry.

The reason for supporting multiple algorithms for both signing and hashing is three fold. First off, the standard industry practice of crypto-agility (https://en.wikipedia.org/wiki/Cryptographic_agility). Second, the need to support mandated signature requirements in various international situation (e.g., ETSI standards in the EU). Third, most X.509 certificates include the hash algorithm that is to be used for their payload and our need to the most common ones.

swenkeratmicrosoft commented 2 years ago

@lrosenthol I do have a question about JSON + CBOR.

Supporting both makes server-side implementations easier because it can reuse existing JSON libraries/etc, but it makes client-side implementations more difficult because mandates that a client (decoder) support both.

Isn't it preferable to reduce client-side code (where resources are far more limited) than server-side code?

lrosenthol commented 2 years ago

@swenkeratmicrosoft Actually, there is NO mandate to support JSON - only CBOR, since the only specified use of JSON is in optional assertions that are not required to be read/processed/validated.

lrosenthol commented 2 years ago

As mentioned, we have kept the number of required technologies to minimum - CBOR/COSE and JUMBF. JSON and XMP/XML are both optional.

Crypto-agility and compliance with national laws in various parts of the world put the requirements on our choices of technologies there - but they are all well known and established standards.