eellak / build-recorder

GNU Lesser General Public License v2.1
25 stars 8 forks source link

Support multiple hash algorithms (not SHA-1) #153

Open timretout opened 1 year ago

timretout commented 1 year ago

hash.c seems to produce SHA-1 digests.

SHA-1 is broken: https://shattered.io/ and for example NIST deprecated its use in 2011.

Rather than just upgrade to, say, SHA-256, it would be nice if the output format could record the type of algorithm alongside the checksum, to allow for future migrations.

fvalasiad commented 1 year ago

Nice to explore, I thought before of adding an option that allows one to choose the hashing algorithm and it might happen (why not)! But let me ask you.

I understand the fact that SHA1 is broken and that we can actually have collisions. But that's mostly a problem for cryptography uses of hashing algorithms! For us it's only potentially "bad" because two source files may randomly turn out to have the same hash, but what's the chance of this happening?

What I am trying to say is that we aren't expecting malicious users that try to find files that "break" build recorder. There is no point in doing that, nothing to win.

Again I am not denying your request, it's on the TODO list and you can even contribute towards making it reality. I just want to hear your thoughts on this.

timretout commented 1 year ago

Yep, I get it. :)

Many large companies are interested in a "build recorder"-type approach for the software supply-chain security problem, where it would be nice to trace how binaries were originally compiled. These companies have adversaries with considerable resources, e.g. banks vs. organised crime.

These companies care about supply chain attacks - i.e. an employee of a supplier might try to supply a malicious binary to them, and to counter that they want to see how the binaries were built.

There are a few problems with SHA-1 in the above scenario:

Git is an interesting special case - it uses a collision detection library (sha1cd) to identify known attacks on SHA-1; because the migration of git to new algorithms is tricky. But new applications should use stronger algorithms.

zvr commented 1 year ago

We use the exact same computation used in SWHID (and git hash-object), as this is the most useful to refer to file contents in general. There are no plans to change this.

We could have options to also record other file data (including other content hashes).

I will leave this issue open for future exploration.

timretout commented 1 year ago

We use the exact same computation used in SWHID (and git hash-object), as this is the most useful to refer to file contents in general.

If you are looking to align with the output of git hash-object, then note that these days git uses the sha1dc library to detect and mitigate known collision attacks:

https://github.com/git/git/blob/a0789512c5a4ae7da935cd2e419f253cb3cb4ce7/sha1dc_git.h

Adopting this would give you better compatibility with git.

Tim