Open gmaxwell opened 4 years ago
Last time we discussed this on IRC some people said that providing 3 functions might be a bit much and having a single sha256 function instead is better. I think that a single one might be a bit awkward because it requires allocating a big buffer on the stack that can contain everything (and then more things you need to clean).
What are your thoughts on this?
I think you'll want this feature only if you seriously care about performance. And in this case, you probably have the more complex API with 3 functions anyway.
Also, in practice a single function is often harder to use - if you want to hash a series of things it's nice to just throw them all into a hash engine ... if you need to create a buffer first then (depending on your language) you may need to think about buffer allocation and indexing.
Another alternate might be to just conditionally compile out the sha256 functions, and require the user provide replacements with exactly the correct names in order for linking to succeed. This is what we do for the default error callbacks for example.
only if you seriously care about performance.
I think the vast majority of users would be ones that don't want to have an extra 19kilobytes in their binary.
Another alternate might be to just conditionally compile out the sha256 functions, and require the user provide replacements with exactly the correct names in order for linking to succeed.
That would be ducky too.
Unless anyone already works on this I'd like to give it a try :)
Unless anyone already works on this I'd like to give it a try :)
Could you look into this?
Could you look into this?
IIRC the problem I encountered was the SHA2 context. I remember a few solutions but none of them was great:
Decide that the context needs to be size X, provide an opaque struct with unsigned array size X, and the implementer can memcpy this into his own Context back and forth (or type pun it? need to look into what the standard says on that).
The implementer also needs to provide a header with the name sha256.h
that defines the struct+functions using the same names/symbols we use.
Provide a c_void pointer to the context in the function's API. (it then requires a reset function to reset the context, and it limits us because we can't hash two things "in parallel")
Any feedback on which is best or maybe a different solution is appreciated.
One possibility is keeping the SHA256 padding/chopping into blocks on the secp256k1 side, and only require external implementations to provide a SHA256 transformation function.
The API would look like:
/** Update the state (array of 8 uint32_t) pointed to by state with 64*blocks bytes of input pointed to by data. */
void sha256_transform(uint32_t *state, size_t blocks, unsigned char *data);
This means the external implementation may need to copy data from/to "array of 8 uint32_t" to their own internal representation - but I suspect almost any C-like implementation already uses that representation anyway.
That's an interesting idea. I do think it will probably require the user to slightly modify his implementation to match this (I've looked at a few sha2 implementations I've found on google, and they all have their own quirks when it comes to the "transform" function)
And is there any real advantage with passing a bunch of blocks at the same time? won't it just call the transform function in a loop? or is there some SIMD/SHA-NI magic this enables?
I played with this, it works pretty nice, but then I realized that one of the reasons were binary size, so I measured using counting asm lines in godbolt(I'm not sure if it's a good way to compare),
and exposing RFC6979+HMAC+SHA2+SHA2_transform.
with -Os
(for minimal size):
As-is: 3,523 lines of asm.
Omitting the transform
function: 429 lines.
Omitting all of SHA2: 242 lines.
with -O2
:
As-is: 3,844 lines.
Omitting transform
: 511 lines.
Omitting all of SHA2: 236 lines.
with -O3
:
As-is: 6,852 lines.
Omitting transform
: 3,470 lines.
Omitting all of SHA2: 565 lines.
(-O3
is crazy lol)
Anyone who wants to play with it: https://godbolt.org/z/MPF3w9
add -DUSE_EXTERNAL_SHA2_TRANSFORM
to omit the transform
function.
add -DUSE_EXTERNAL_SHA2
to omit all of sha2 functions and leave only HMAC+RFC6979
For binary size you can really just look at the file size of the binary, see https://github.com/bitcoin-core/secp256k1/pull/700#issuecomment-587503446.
add
-DUSE_EXTERNAL_SHA2
to omit all of sha2 functions and leave only HMAC+RFC6979
If at all, I think we would want the opposite. We'll need SHA256 for Schnorr sigs and ECDH, but HMAC/RFC6979 is only used for deriving nonces, and you can use SHA256, too.
If at all, I think we would want the opposite. We'll need SHA256 for Schnorr sigs and ECDH, but HMAC/RFC6979 is only used for deriving nonces, and you can use SHA256, too.
I just had to expose some end result through godbolt to get asm, I used RFC6979 because it doesn't have any complex logic and it uses SHA256 in a complex way that prevent inlining everything and optimizing write together with finalize and initialize (so it's an example to show the asm code of SHA2 not of RFC6979)
This really depends on whether we have a compile-time override or a runtime override but we should think about a self-test in the spirit of https://github.com/bitcoin/bitcoin/blob/99813a9745fe10a58bedd7a4cb721faf14f907a4/src/crypto/sha256.cpp#L465 .
I think the main advantage is actually code size, because my tests don't show big advantages for optimized sha2 implementations, although I haven't tested with SHA-NI but I don't think it's going to be a huge improvement in terms of performance, as SHA2 is really fast compared to anything EC related
Im making embedded library containing libsecp256k1, see https://github.com/switck/libngu
thoughts:
RE: sharing SHA256 code
@elichai Any update here? Maybe, if you currently don't have a lot of time, it would still be good to share your WIP branch, so someone else could adopt the PR.
@elichai Any update here? Maybe, if you currently don't have a lot of time, it would still be good to share your WIP branch, so someone else could adopt the PR.
Sadly I can't find it :(
One possibility is keeping the SHA256 padding/chopping into blocks on the secp256k1 side, and only require external implementations to provide a SHA256 transformation function.
The API would look like:
/** Update the state (array of 8 uint32_t) pointed to by state with 64*blocks bytes of input pointed to by data. */ void sha256_transform(uint32_t *state, size_t blocks, unsigned char *data);
This means the external implementation may need to copy data from/to "array of 8 uint32_t" to their own internal representation - but I suspect almost any C-like implementation already uses that representation anyway.
Do you know if implementations would want to keep the state as a __m128i
instead of uint32_t
? (I see that bitcoin's implementations load it to vector registers each time, but wondering if that's optimal)
It's a bit silly that the library packs in its own sha256 even when the user has their own and on small devices it's a costly waste of space.
In places like Bitcoin Core the environment has a fast SIMD or SHA-NI sha256, ... it actually makes a measurable difference in signing time to use SHA-NI.
Going a step back, I believe that these are separate concerns. As per the discussion here, an override helps mostly to save space.
If we want to profit from hardware optimizations, this is somewhat orthogonal. An override would help here but only if the caller controls the compilation. This is true for Bitcoin Core but in general it's rather an exception. Also, if you look at Core's SHA256 implementation (https://github.com/bitcoin/bitcoin/blob/master/src/crypto/sha256.cpp), it's not clear how the override would look like. Depending on the available hardware, the essential function is either Transform
, Transform_2way
, Transform_4way
, Transform_8way
.
This is an argument in favor of optimizing our SHA256 implementation (independent of the possibility to override), and I tend to believe that this is desirable. We didn't want to do this in the past because SHA256 was an implementation detail. But as it's an integral part of Schnorr verification and signing now, things have changed. It's just somewhat messy: We'd need to duplicate code from Core (and worse, change it to C). Or move SHA256 (and maybe other stuff such as ChaCha20 for #760) to a separate library that's linked to secp256k1 and Core. Or expose the SHA256 here and make Core use them. None of this sounds great.
Thoughts?
I'd very much like being able to plug in a different sha256 implementation to save binary space.
It's a bit silly that the library packs in its own sha256 even when the user has their own and on small devices it's a costly waste of space. In places like Bitcoin Core the environment has a fast SIMD or SHA-NI sha256, ... it actually makes a measurable difference in signing time to use SHA-NI.
It would be nice if there were some -Dlibsecp2561k_sha256init=x -Dlibsecp2561k_sha256update=y -Dlibsecp2561k_sha256final=z settings that could be used to compile time substitute in another library and leave the internal one out.