RustCrypto / asm-hashes

Assembly implementations of cryptographic hash functions
46 stars 26 forks source link

Migrate to assembly from OpenSSL #5

Closed newpavlov closed 5 months ago

newpavlov commented 5 years ago

Initially crates in this repository were a simple proof-of-work experiment and assembly from Project Nayuki was taken without much consideration to performance. But some people use asm feature and expect performance on par with OpenSSL which is obviously not true today.

So I think we should migrate to OpenSSL assembly and maybe rename this repository to openssl-asm or something. This will require some code refactoring as OpenSSL hash assembly processes several blocks at a time and for the best performance we will have to accommodate for it. My initial experiments with MD5 haven't found any difficult problems.

The main question I currently have is: should we pack Perl scripts into crates or generated assembly? First option will result in a lighter crates (especially considering various possible "flavors" which Perl scripts support), but will require Perl to be installed on the system. For Linux it's not a big deal, but I think for Windows it can be a problem.

tarcieri commented 5 years ago

Preserving the Perl will make it easier to track upstream changes. It sure would be nice to get rid of it though.

It just does templating, so if it can be adapted to some other templating system or syntax, possibly as a mechanical translation from the Perl, I'd be a fan of that.

newpavlov commented 5 years ago

I thought about adding openssl repo as a submodule and using a small bash script which will regenerate assembly for all crates. So time to time we will manually update submodule, execute the script, and check if there any changes.

tarcieri commented 5 years ago

@newpavlov as a minimum viable solution to removing the Perl dependency, that sounds fine to me.

tarcieri commented 5 years ago

Here's an alternative idea that sounds fairly difficult and likely not worth it, but kind of fun:

I think it might be possible to convert the Perl templates (through a similar mechanical preprocessing step on the upstream files from the openssl repo with a checked-in result) into some other template, be it just a format! string or what have you, and then include_str! it, replacing the Perl templating with a combination of CPU architecture-dependent conditional compilation and Rust macros.

newpavlov commented 5 years ago

It will be essentially a limited Perl-to-Rust convertor, so indeed it's probably not worth the trouble. :)

tarcieri commented 5 years ago

I think it's a lot easier than that, but it would essentially involve rewriting the library-level parts of the Perl templating system as e.g. Rust macros, in addition to extracting and converting/munging the asm templates.

Still seems like a lot of work.

PrismaPhonic commented 4 years ago

I would like to see some assembly that takes advantage of avx2. Currently, the Golang SHA-1 library does: https://github.com/golang/go/blob/master/src/crypto/sha1/sha1block_amd64.s

This uses the same algorithm as the Linux kernel: https://github.com/torvalds/linux/blob/master/arch/x86/crypto/sha1_avx2_x86_64_asm.S

newpavlov commented 4 years ago

@PrismaPhonic IIUC OpenSSL assembly (sha1, sha256) does account for AVX2 instruction set, so it's matter of generating appropriate assembly files. Although I haven't compared performance with the files linked by you.

PrismaPhonic commented 4 years ago

For sure. I know I'm just now stepping into this conversation and am not even a contributor (yet), but I personally would prefer avoiding Perl scripts altogether. I would vote for sticking to raw assembly files, and check Rust side about which instruction set should be used.

EDIT: My reasoning behind that is that combining perl & asm makes for extremely unreadable code. I'm frankly a bit shocked that the openssl devs decided to use that approach. I'm not sure there exists a less readable way of generating asm.

newpavlov commented 4 years ago

It looks like at least for hash functions OpenSSL assembly handles dynamic dispatch itself by reading data from OPENSSL_ia32cap_P, which contains CPUID results (on ARM it's OPENSSL_armcap_P). I think it will be a bit tricky to setup this constant correctly, but should be doable.

joshtriplett commented 3 years ago

Quick note: from a licensing point of view, this needs some care to make sure that it's safe to do. OpenSSL has relicensed to Apache 2.0, but the asm modules in OpenSSL appear to be additionally dual-licensed under BSD and GPLv2. So, if you want to integrate that assembly, you'd need to change the license of this crate accordingly (BSD-3-Clause OR Apache-2.0 would work).

tarcieri commented 3 years ago

@joshtriplett yeah. See recent licensing discussion on https://github.com/RustCrypto/asm-hashes/pull/32#pullrequestreview-615460991

Byron commented 3 years ago

I would like to join this discussion out of my interest to beat Git when resolving packs on ARM (or AArch64 specifically), which right now doesn't seem possible when using only 4 performance cores. As gitoxide is already faster on Intel I am super eager to see by how much it will be faster on ARM/AArch64.

That said, I am currently working on this PR to help clean up ARM/AArch64 intrinsics which I understand might be one way to speed things up here, even though ultimately it might not be used based on what I gather from this discussion:

The above would make most sense to me, and hope you can make corrections and chime in to clear a path forward, maybe to make contributions possible.

Thanks a lot!

tarcieri commented 3 years ago

I think we're in a good place now to attempt adding a core::arch-based implementation of SHA-256 which uses the ARMv8 Cryptography Extensions.

I have an M1 and can take a look. No OpenSSL-based ASM would be required for this, just wrapping the underlying intrinsics in pure Rust.

Likewise, it'd probably still be better to use core::arch for an AVX2 implementation as well, although as a stopgap we could potentially use the OpenSSL-sourced assembly.

As I noted earlier, while OpenSSL provides high performance implementations across a number of architectures which is nice, one of the big problems with it is the upstream assembly code is generated by a Perl-based templating system which it would be nice to avoid, especially as a mandatory prerequisite of the build process.

Byron commented 3 years ago

I love that, especially with an immediate way forward with something that is part of the ecosystem. It looks like there are SHA1 instructions as well in V8 which I assume AArch64 is based on, that should do the trick for me.

tarcieri commented 3 years ago

I took a look at implementing SHA-256 using the ARMv8 Cryptography Extensions:

https://gist.github.com/tarcieri/414a3300072160f372b5d93ccfce280b

Unfortunately it's nightly-only as it requires feature(stdsimd), so in the interim (as @newpavlov has pointed out in the past) we still need an assembly implementation to support stable Rust.

As it were, there are also a couple NEON intrinsics that are/were missing:

Regarding adding vst1q_u32, here's a PR that added vld1q_u32 for reference: https://github.com/rust-lang/stdarch/pull/899/files (edit: I'm told this is the old way of doing it, and new intrinsics should be added to neon.spec instead)

Byron commented 3 years ago

I don't know if it matters but hope that the merge of this stdarch PR helps with this endeavour once the intrinsics trickle down to stable.

newpavlov commented 3 years ago

@Byron Crates in this repository are about wrapping implementations written in assembly. Intrinsic-based implementation reside in the main crates (e.g. see sha-1, sha2, aes and others).

tarcieri commented 3 years ago

Opened a tracking issue for stdarch intrinsic-based implementations here: https://github.com/RustCrypto/hashes/issues/257

tarcieri commented 3 years ago

Regarding SHA2 on x86 platforms, #41 (using Intel-provided BSD licensed assembly contributed to the Linux kernel) looks like a nice alternative

newpavlov commented 5 months ago

We decided to deprecate these crates. We could add ASM backends based on OpenSSL code in future, but such PRs should be opened in the hashes repository.