dalek-cryptography / curve25519-dalek

A pure-Rust implementation of group operations on Ristretto and Curve25519
Other
870 stars 439 forks source link

AVX512 code will need to change to handle the whole AVX10 business #693

Open workingjubilee opened 2 weeks ago

workingjubilee commented 2 weeks ago

Hello.

You heard about the whole AVX10 business, I presume? With AVX10/256 and AVX10/512?

We will likely need to adjust what the "avx512" features even mean, unfortunately, as a result of this. Fortunately, they're unstable, so we can do that! So it might need some modifications in dalek, but it's not like there will be any unforeseen complications from changing nightly features in code that didn't enable it intentionally... right?

https://github.com/dalek-cryptography/curve25519-dalek/blob/0964f800ab2114a862543ca000291a6e3531c203/curve25519-dalek/build.rs#L38-L44

...right?

RalfJung commented 2 weeks ago

In other words: building this crate may fail on nightly when the work on redoing these features happens. The compiler team considers automatic use of nightly features in crates that otherwise work on stable to be a breach of the nightly compiler contract, so this is not considered a breaking change -- but if this happens, it will mean that everyone using nightly to build this crate (or any crate that depends on this crate) will get compilation failures.

For context:

The preferred way to avoid this issue is to not enable nightly features automatically, but instead make them cargo features that need to be opt-in. That makes it possible to build the crate on nightly in exactly the same way that it would build on stable, which is extremely valuable for doing compiler development and testing.

Another option is to use something like autocfg to test whether the concrete kind of code you need works on the current compiler. If that code stops working, the crate can then automatically fall back to the stable code path. However, this reduces the usefulness of nightly as a testing vehicle: the point of nightly is that we can know that things will break long before they break on stable. To achieve this, however, it is important that testing a crate on nightly actually tests the thing that would happen when the nightly became stable. By making the same crate build different code on nightly vs stable, this is not the case any more.

tarcieri commented 2 weeks ago

I can't speak to auto-enabling nightly features, but in general I think nightly users need to expect that upstream rustc changes can break nightly builds at any time, and all we can do is release new versions of these crates when such upstream breaking changes happen, and nightly users need to upgrade to the latest versions which such breaking changes occur.

RalfJung commented 2 weeks ago

Of course nightly changes can break nightly crates (as in, crates that need nightly to build), but it should generally be the case that if a crate builds on stable, it also builds on nightly. This is crucial e.g. for narrowing down a stable-to-nightly regression.

tarcieri commented 2 weeks ago

I agree that the nightly-related functionality should probably be opt-in (though my comment wasn’t related to that).

pinkforest commented 2 weeks ago

Ref issues I raised earlier:

I've pointed the AVX512 probably could be default-removed because it cannot be tested and maintained reliably.

IMO it probably should be made opt-in until the support stabilises better and it can be tested reliably between AVX10?

Only consumer thing that benefits of it is zen4 / 7950X now that intel slid up with it

Also I raised issue now to create the opt-in mechanism:

For deciding the opt-in mechanism - we've declared the backends as implementation detail so it's non-breaking

However I would still recommend doing minor bump if we implement the opt-in.

workingjubilee commented 2 weeks ago

Yes, I most certainly have an AVX512 CPU that can run the code in question, despite it being a desktop, consumer-oriented CPU, not even a workstation CPU, so I do not think you should disable it entirely, necessarily. :^)

And I am happy to help people run tests if necessary.

The main thing is that upcoming Intel CPUs will implement what Intel is calling "AVX10", which is specified here: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf

Essentially, it allows new instruction patterns to be used while not requiring the full 512-bit zmm registers, but rather independently specifying that. This is especially useful to code which has exceptionally specialized instructions for accelerating it beyond just things like basic arithmetic ops, where widening the ops means computing faster. So, cryptography instructions.

Thus, we need to figure out some way of allowing people to say "I want to use this AVX{version} intrinsic, but I only need {M} bits" (or "no, I need the full {N} bits") which basically means we have to mess up something for someone somewhere.

pinkforest commented 2 weeks ago

How can we CI it? The original idea was that github used to have AVX512 capable runners at reasonable ratio and the nightly job tested it out. But now that GitHub has way less AVX512 capable runners testing it in CI has become impossible. Qemu has support but I don't know if it can be relied to a level to provide it as "Tier1 backend" or "Tier2" :thinking:

tgross35 commented 2 weeks ago

I believe in stdarch we use the Intel SDE (wow that website has terrible scrolling), not sure how easy that is to get off the ground. @sayantn may know more.

pinkforest commented 2 weeks ago

PR's up:

sayantn commented 2 weeks ago

We are actually kind of stuck with SDE being our only option - qemu-user doesn't support AVX512, qemu-system does, but needs KVM support, which we don't have in the runners. SDE has some bugs too, when we tested with stdarch, v9.38 was quite buggy on linux (but surprisingly, the windows version was fine), so just be a little careful.