Implement Classic McEliece

pufferffish commented 1 year ago

This is a draft of a Classic McEliece that implements all the parameters specified in the round 3 submission. It passes the KAT tests provided in the official website.

The current major problem right now is that public key generation takes so much time it causes tests on CI to timeout and fail. This is because the Gaussian Elimination loop inside pkGen in pk_gen.go can take millions of iterations for larger parameters. Vectorization should be able to make it faster by like 30x but I haven't implemented it yet. Currently there are only generic Go implementations.

Some test data outside of the KAT test are lifted from a Rust implementation. These test data are to ensure that some specific functions work correctly.

Closes #360

armfazh commented 1 year ago

@octeep thanks for pushing this. Also, could you please squash your commits.

We will review this code closely.

Definitely key generation takes lots of time (ARM64 tests timed out, while x64's not).

What about adding a build constraint to compile this code only on x64?

What do you think @bwesterb ?

bwesterb commented 1 year ago

What do you think @bwesterb ?

I'll have a look tomorrow.

pufferffish commented 1 year ago

@octeep thanks for pushing this. Also, could you please squash your commits.

We will review this code closely.

Definitely key generation takes lots of time (ARM64 tests timed out, while x64's not).

What about adding a build constraint to compile this code only on x64?

What do you think @bwesterb ?

I've squashed my commit. This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon (See this). It might be better to wait for them to publish the round 4 so this PR can be updated accordingly.

bwesterb commented 1 year ago

Thanks for this @octeep

This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon

Ok.

Can I ask you how did this implementation came to be: is it a fresh implementation from the spec or did you translate an existing implementation (ref or rust)?

I had an initial look today. To set expectations: it will of course take quite some time to carefully review this.

Some initial comments:

Key generation is very slow and I confirmed it's mostly in the Gaussian elimination. It's much slower than the C reference implementation. How much experience do you have with optimising for Go? If you're new to it: the Go compiler is very dumb. They favour compilation speed against optimisation potential. Dumb things like unrolling loops can make a big difference. I think an optimised Gaussian elimination might be worthwhile.
Have you given thought to organising the types/code in such a way that it's easy to add in SIMD optimisations?
Having megabytes of test data is unworkable. Please reduce that to a few or even a single hash.
The comments on the code aren't terrible, definitely compared to most crypto code, but we'd like to have a high standard.

pufferffish commented 1 year ago

Thanks for this @octeep

This probably shouldn't be merged yet since the Classic McEliece team might soon release their round 4 specs and software soon

Ok.

Can I ask you how did this implementation came to be: is it a fresh implementation from the spec or did you translate an existing implementation (ref or rust)?

Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.

I had an initial look today. To set expectations: it will of course take quite some time to carefully review this.

Some initial comments:

Key generation is very slow and I confirmed it's mostly in the Gaussian elimination. It's much slower than the C reference implementation. How much experience do you have with optimising for Go? If you're new to it: the Go compiler is very dumb. They favour compilation speed against optimisation potential. Dumb things like unrolling loops can make a big difference. I think an optimised Gaussian elimination might be worthwhile.

I unfortunately do not have much experience with Go optimization. I also want to note that when I compile the C optimized implementation with vectorization explicitly disabled, the performance is only slightly faster than Go. When I compile the C optimized implementation and tell gcc to show the optimization tree, I see that gcc has performed vectorization automatically, which is why it is much faster than Go.

The vec implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?

Have you given thought to organising the types/code in such a way that it's easy to add in SIMD optimisations?

If we are talking about optimizing pk_gen specifically then I don't think it would be too hard to add SIMD optimizations. Just add some build flags in the pk_gen template and some slight refactoring should be enough.

If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.

Having megabytes of test data is unworkable. Please reduce that to a few or even a single hash.

I will look into that. I don't think reducing it to a few hashes can work since the test cases contain sample inputs. Maybe we can compress it or represent these inputs in raw bytes rather than as hexadecimal ASCII.

The comments on the code aren't terrible, definitely compared to most crypto code, but we'd like to have a high standard.

I will try to document the code to the best of my understanding.

pufferffish commented 1 year ago

I have managed to trim down testdata.txt to 90kb. Would this be acceptable?

bwesterb commented 1 year ago

Thanks again for your efforts!

Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.

Ok! Please add appropriate attributions.

The vec implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?

Yes, that's a great plan!

(For those reading along: vec packs a few field elements into uint64s.)

If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.

I'll get back to this when I found time to properly study the code.

I will try to document the code to the best of my understanding.

Great! If you're unsure about certain parts, please leave a comment as such for us to fill in.

bwesterb commented 1 year ago

I have managed to trim down testdata.txt to 90kb. Would this be acceptable?

@armfazh ?

armfazh commented 1 year ago

I have managed to trim down testdata.txt to 90kb. Would this be acceptable?

After bzip2-ing that file, size gets down to 50 KB, which I think is acceptable.

armfazh commented 1 year ago

As a general comment, please introduce // TODO comments in places where code optimization is needed, such as better number representation or SIMD. They might be not resolved in this PR, but flagging them is good for further development.

pufferffish commented 1 year ago

Thanks again for your efforts!

Most of the code is translated from the 'Optimized Implementation' in the round 3 NIST submission package by the Classic McEliece team. However, when some methods cannot directly be ported from C (for example due to pointer arithmetic), the code is lifted from the Rust implementation instead. The Rust implementation is also based on the Optimized Implementation as far as I can tell.

Ok! Please add appropriate attributions.

I'm not sure how I should add attributions here. I have added a notice on each mceliece.go saying that some code are translated from the Rust implementation, and included the author's name and github link. Do I need to do more than that? (e.g. include their MIT license)

The vec implementations in Addition Implementations from the NIST submission package seems to perform much faster even when I compiled it with vectorization disabled. Maybe this can be ported to generic Go for an optimized Gaussian Elimination?

Yes, that's a great plan!

I am trying to do that in the vec branch in my fork repo. I can't promise if I will ever finish that but you can track the progress there if interested.

(For those reading along: vec packs a few field elements into uint64s.)

If we are talking about optimizing all the components with SIMD then no, I haven't given much thoughts into it.

I'll get back to this when I found time to properly study the code.

I will try to document the code to the best of my understanding.

Great! If you're unsure about certain parts, please leave a comment as such for us to fill in.

bwesterb commented 1 year ago

Ok! Please add appropriate attributions.

I have added a notice on each mceliece.go saying that some code are translated from the Rust implementation, and included the author's name and github link.

That should be enough for now.

I am trying to do that in the vec branch in my fork repo. I can't promise if I will ever finish that but you can track the progress there if interested.

Cool.

pufferffish commented 1 year ago

Ok good news: I managed to port the vec implementation of pk_gen to go and tested it on mceliece6960119. The KAT test now went from 120s to 17s. Still not as fast as the C implementation but it is much better than the current Go one.

bwesterb commented 1 year ago

Ok good news: I managed to port the vec implementation of pk_gen to go and tested it on mceliece6960119. The KAT test now went from 120s to 17s. Still not as fast as the C implementation but it is much better than the current Go one.

Good job!

pufferffish commented 1 year ago

I've ported the vec implementation for all the systematic parameters (parameters that don't end with a f). Right now amd64 and MacOS implementations finish on CI finishes within 3 minutes, but I still can't get it to finish within 10 minutes on arm64.

After porting the vec implementation to semi-systematic parameters it should be able to finish within 10 minutes.

pufferffish commented 1 year ago

I've ported the vec implementation for all the systematic parameters (parameters that don't end with a f). Right now amd64 and MacOS implementations finish on CI finishes within 3 minutes, but I still can't get it to finish within 10 minutes on arm64.

After porting the vec implementation to semi-systematic parameters it should be able to finish within 10 minutes.

I've now also ported vec for semi-systematic parameters. On arm64 CI it now completes under 10 minutes so it passes CI now.

armfazh commented 1 year ago

The round 4 version has been updated, see https://classic.mceliece.org/nist.html

pufferffish commented 1 year ago

The round 4 version has been updated, see https://classic.mceliece.org/nist.html

I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.

d-z-m commented 1 year ago

Hello :wave: , just wanted to say this change is desired, and that I hope code review proceeds in the near future! :rocket:

d-z-m commented 1 year ago

Hello, just wanted to inquire if there's been any movement on this! Would love to see this get reviewed soon.

armfazh commented 1 year ago

Hi, I will continue with the review of this PR in the next couple days. Thanks for your patience.

mariobm commented 1 year ago

Any updates on this one?

stv0g commented 1 year ago

I am also really interested in this PR as I am planning to use this for the Go implementation of the WireGuard Rosenpass Key Exchange.

stv0g commented 1 year ago

I noticed that the current implementation differs from liboqs with respect to the ciphertext length. See the following note:

https://classic.mceliece.org/mceliece-pc-20221023.pdf

stv0g commented 1 year ago

I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.

The file headers are still mentioning round 3:

// Package mceliece348864 implements the IND-CCA2 secure key encapsulation mechanism
// mceliece348864 as submitted to round 3 of the NIST PQC competition and
// described in

stv0g commented 1 year ago

@armfazh Can we remove the on-hold label from the issue? Round 4 specs have been released. There is nothing blocking this from getting finished :)

https://classic.mceliece.org/nist.html

pufferffish commented 1 year ago

I've updated the implementation to the round 4 specification. It passes KAT tests and finishes in 10 minutes on CI.

The file headers are still mentioning round 3:
// Package mceliece348864 implements the IND-CCA2 secure key encapsulation mechanism
// mceliece348864 as submitted to round 3 of the NIST PQC competition and
// described in

Thanks for pointing that out, I've fixed that in a new commit.

I noticed that the current implementation differs from liboqs with respect to the ciphertext length. See the following note:

https://classic.mceliece.org/mceliece-pc-20221023.pdf

It would appear that liboqs is still using the round 3 specification which explains the difference in ciphertext length. Mine follows the round 4 specification and the produced ciphertexts are checked against the KAT tests provided by the official Classic McEliece website.

stv0g commented 11 months ago

Hi @bwesterb, @pufferffish, @armfazh,

is there something I can do to support you in getting this implementation merged? Like code-review, testing or something similar?

Thanks :)

armfazh commented 11 months ago

@stv0g we appreciate a code review, our team will be providing feedback as we get more bandwidth.

pufferffish commented 11 months ago

@stv0g thanks for the review. It's been a while since I've worked on this and recently I've been busy with work so I can't promise I can get back to your reviews soon. If you'd like to I can give you write access to my fork repo so you can directly work on this pull request.

d-z-m commented 9 months ago

Hi there,

Interested in seeing this move forward, any updates?

david415 commented 9 months ago

Hi. I get that everyone is busy doing other things and I also understand that I'm essentially the most impatient person in the world which is why I've decided to simply lift all this code into a fork for use with katzenpost... with proper attribution of course.

However I would much rather this pull request get merged and the code maintained by you guys. I hate maintaining forks. But I hate waiting for things even more.

david415 commented 9 months ago

It looks like code contributions from your code contributor has stalled. i suggested you fork his dev branch and open a new pull request... and fix all the code corrections you were telling them to fix. If you don't do this, then this PR will likely continue to fester here meanwhile Rome is burning... society cracking at the foundations, abortion clinics need PQ KEMs like this one.

cloudflare / circl

Implement Classic McEliece #378