BLAKE2 / libb2

C library providing BLAKE2b, BLAKE2s, BLAKE2bp, BLAKE2sp
Creative Commons Zero v1.0 Universal
132 stars 47 forks source link

setting parameters on big endian systems #12

Open oconnor663 opened 6 years ago

oconnor663 commented 6 years ago

It looks like functions like blake2b_param_set_node_offset convert their argument to little-endian on the way in. But those functions aren't exposed in the blake2.h API. Is the expectation that callers (who plan on supporting big endian platforms) will convert their parameters to little endian on their own? Or maybe the plan was to expose the setter functions in blake2.h eventually?

sneves commented 6 years ago

I don't quite recall, but I do believe the plan was at some point to expose those functions to the outside, for users that would want to make their own portable parameter blocks.

But since no such users ever appeared, those functions ended up never getting exposed. Even internally, for blake2xp, we end up making the parameter block manually anyway.

oconnor663 commented 6 years ago

I wonder if that means CPython's blake2 implementation, for example, has a bug on big endian platforms? They seem to be setting the leaf_length parameter from a native int. I was about to do the same thing myself, as part of working on a Rust wrapper.

sneves commented 6 years ago

That's an excellent question. I think there's no bug, because the reference implementation explicitly loads the parameter block in little endian order. The optimized x86 implementations do not do this, but everything's assumed to be little-endian in those.

But in all honesty I did not check if there is a bug or not; I don't have big-endian hardware to verify this right now.

oconnor663 commented 6 years ago

I was staring at that earlier, and here was how I read it: The load64 function takes a pointer to little-endian bytes, and loads a native integer from them. However, if you're on PowerPC or whatever, and you execute the line self->param.leaf_length = (unsigned int)leaf_size as Python does, you've written big-endian bytes to the parameter block. The load64 call doesn't expect this, and the result is a flipped value for S->h[i].

I haven't owned PowerPC hardware in...some time...but I might try to fire up QEMU tonight and see if I can test this.

Edit: These look useful for testing https://stackoverflow.com/questions/3337896/imitate-emulate-a-big-endian-behavior-in-c https://people.debian.org/~aurel32/qemu/

sneves commented 6 years ago

Yeah, I think you're right.

In hindsight, the parameter block ought to be an opaque structure, with the functions used to set the appropriate bytes of it.

oconnor663 commented 6 years ago

Confirmed!

image

I don't know of any official test vectors that set leaf_length or node_offset, but when I run the same two hashes on my regular Intel machine, the results are swapped:

Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, hashlib
>>> sys.byteorder
'little'
>>> hashlib.blake2b(b"foo", leaf_size=1).hexdigest()
'8878fcd2202a81a2a26adeb5cc6e240528c80e2b4ec80b940d76686febe91b368b4a98ebb39b21befcc02ffaf7516e27d2894fc1c096b57c0e3dde69e01c5889'
>>> hashlib.blake2b(b"foo", leaf_size=(1<<24)).hexdigest()
'bfd0c39502d381eb4e45189f051fd67fd69fd972ad471550a879347d846c1700927d19065a08bf3acdd8fc209cbd21a9bb8619c56fc79207a1ece990b21ddf2b'

It should be simple to patch CPython, and I'll see if I can add a test for this. On the libb2 side of things, probably we should at least stick some clarifying comments inline in blake2.h? I worry that changing blake2*_init_param to assume native endianness in struct fields would break existing callers who've correctly worked around this. (If there are any?)

Can you think of any other languages or libraries that expose these fields, that we'd want to eyeball for the same bug?

sneves commented 6 years ago

I've pushed a candidate solution to a separate branch, under which existing correct code should be able to continue functioning properly, but a new API exists to explicit build correct parameter blocks.

To be able to do this, I'm using unnamed structs in unions, which appears to be something only standardized in C11, but recent GCC, Clang, and MSVC all appear to support this.

I don't know the extent of the damage. To my knowledge, most bindings/implementations are limited to the variable length and key parameters, for which endianness does not matter.

oconnor663 commented 6 years ago

Those changes look good to me, though I don't have much experience with this code. Maybe we could stick a comment on the anonymous struct like:

// These struct fields are deprecated, and kept around only for backwards
// compatibility. All new code should use the param_set helper functions
// instead. (In particular, setting the leaf_length or node_offset fields here
// gives incorrect results on big endian platforms, if the caller doesn't
// explicitly swap endianness first. See https://github.com/BLAKE2/libb2/issues/12.)
oconnor663 commented 6 years ago

Also will https://github.com/BLAKE2/BLAKE2 want to make similar changes?

sneves commented 6 years ago

Yes, the plan is to merge the two repositories at some point. Hopefully soon.

oconnor663 commented 6 years ago

Oh, I think on lines 178-179 of blake2.h you're using BLAKE2S_SALTBYTES and BLAKE2S_PERSONALBYTES when you mean BLAKE2B_.

oconnor663 commented 6 years ago

Another design comment about https://github.com/BLAKE2/libb2/tree/fix-bigendian: I think blake2*_param_init should set the official defaults, rather than just zeroing the whole parameter block. That is, I think it should set fanout and depth to 1, and digest_length to OUTBYTES. Almost all callers are going to want to do that, but it's also something they're likely to forget, at least until they debug why their hashes aren't coming out right. That would also remove some duplicated code from blake2*_init and blake2*_init_key.

sneves commented 6 years ago

Excellent idea, I'll incorporate this.