c2pa-org / specifications

The public specifications for the C2PA
Creative Commons Attribution 4.0 International
92 stars 8 forks source link

Clarification: security properties of exclusion-instead-of-inclusion byte-range lists #40

Open jayaddison opened 11 months ago

jayaddison commented 11 months ago

Hi - I have a question about heading 4.1.2 of the C2PA v1.3 specification:

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in the core specification. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.

Would it be possible to share additional information about the types of attack that inclusion lists were found to be vulnerable to, and how exclusion lists defend against these?

Thank you, James

lrosenthol commented 9 months ago

@jayaddison I don't see why not - will add to our internal issues tracker to address!

jayaddison commented 9 months ago

Thank you, @lrosenthol!

jayaddison commented 3 weeks ago

Before responding further, a recap on since-published versions:

v1.3 guidance: https://github.com/c2pa-org/specifications/blob/d0bea331ffc4756b74069f8b3e6c250214aec8d0/build/site/specifications/1.3/guidance/Guidance.html#L485

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in the core specification. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.

v1.4 guidance (unchanged, apart from the hyperlink): https://github.com/c2pa-org/specifications/blob/d0bea331ffc4756b74069f8b3e6c250214aec8d0/build/site/specifications/1.4/guidance/Guidance.html#L453

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm over some or all of the bytes of an asset as described in the core specification. Traditionally, this type of binding is done over an inclusive list of byte ranges of the asset. However, a number of attacks on an inclusion list-based approach were identified and it was determined that they are prevented by the use of exclusions lists. These vulnerabilities would have allowed content to be added to an asset that altered the digital content without altering the hard bindings.

v2.0 guidance (change in terminology): https://github.com/c2pa-org/specifications/blob/d0bea331ffc4756b74069f8b3e6c250214aec8d0/build/site/specifications/2.0/specs/C2PA_Specification.html#L1745

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm, as described in Section 11.3.4.2, “Hashing”, over some or all of the bytes of an asset. This approach can be used on any type of asset, but should only be considered for formats that don’t support one of the forms of box-based hashing.

v2.1 guidance: https://github.com/c2pa-org/specifications/blob/d0bea331ffc4756b74069f8b3e6c250214aec8d0/build/site/specifications/2.1/specs/C2PA_Specification.html#L2053

The simplest type of hard binding that can be used to detect tampering is a cryptographic hashing algorithm, as described in Section 13.1, “Hashing”, over some or all of the bytes of an asset. This approach can be used on any type of asset, but should only be considered for formats that don’t support one of the forms of box-based hashing.

As an aside: it seems that version control is potentially being used in an unexpected way in this repository -- generally the authored source materials used to produce documents (and if necessary the resulting output from building those sources) would be committed to source control, with revision history available to browse only the diffs/patches applied for each revision. In this case it seems to me that subsequent versions are being added to source control as separate directories, meaning that common git version control workflow practices cannot easily be used to compare inter-version changes. However, I can understand that there may be organizational and/or process-driven reasons to have made those choices. What matters more is scrutiny of the content.

I haven't heard of the term box hashing before in the context of information security, so will spend some time to learn more about that.

In my experience, hashing the entire content of a file (without using include/exclude ranges) tends to be the preferred approach when using hashing to identify and/or de-duplicate content that may be bit-for-bit identical (with the caveat that hash collisions may be found for any lossy hash given sufficient compute resources).

Edit: use permalinks for all documentation references Edit 2: fixup for v1.3 documentation reference