RustCrypto / password-hashes

Password hashing functions / KDFs
652 stars 80 forks source link

Why doesn't Argon2's update_address_block function do LE64 conversions? #502

Closed watason closed 4 months ago

watason commented 4 months ago

Hi! I recently read the RFC, so I may be looking in the wrong place, but let me ask you a question. First. In RFC 3.4.1.2 Argon2i, the Z calculation is as follows

Z= ( LE64(r) || LE64(l) || LE64(sl) || LE64(m') ||
     LE64(t) || LE64(y) )

In contrast, in the code here (and other oss codes seemed to be similar), I think the following part is applicable.

                    if data_independent_addressing {
                        input_block.as_mut()[..6].copy_from_slice(&[
                            pass as u64,
                            lane as u64,
                            slice as u64,
                            memory_blocks.len() as u64,
                            iterations as u64,
                            self.algorithm as u64,
                        ]);
                    }

                    let first_block = if pass == 0 && slice == 0 {
                        if data_independent_addressing {
                            // Generate first set of addresses
                            self.update_address_block(
                                &mut address_block,
                                &mut input_block,
                                &zero_block,
                            );
                        }

https://github.com/RustCrypto/password-hashes/blob/master/argon2/src/lib.rs#L339-L357

In the other Little Endian convert part. For example, here

B[i][0] = H'^(1024)(H_0 || LE32(0) || LE32(i))

This is what the code looks like

                // Make the first and second block in each lane as G(H0||0||i) or
                // G(H0||1||i)
                let inputs = &[
                    initial_hash.as_ref(),
                    &i.to_le_bytes()[..],
                    &l.to_le_bytes()[..],
                ];

                let mut hash = [0u8; Block::SIZE];
                blake2b_long(inputs, &mut hash)?;
                block.load(&hash);

Why is it that only the code for update_address_block does not do the conversion?

Thank you in advance.

tarcieri commented 4 months ago

The RFC has chosen to describe the compression function G as accepting two 1024-byte inputs, however our implementation operates directly on two Blocks which are internally an array of 64-bit integers. In either case the input is permuted using P which accepts an input of eight 16-byte registers.

The two are functionally equivalent. Also note our implementation predates the RFC and is based on the Argon2 paper / reference implementation.

watason commented 4 months ago

@tarcieri Thank you for your prompt response!

The RFC has chosen to describe the compression function G as accepting two 1024-byte inputs, however our implementation operates directly on two Blocks which are internally an array of 64-bit integers. In either case the input is permuted using P which accepts an input of eight 16-byte registers.

Yes I am aware of this part.

The two are functionally equivalent. Also note our implementation predates the RFC and is based on the Argon2 paper / reference implementation.

I see, so the implementation is not updated according to the RFC. I am implementing Argon2 myself for study, It would be helpful if you could answer a few questions for future study.

  1. If I refer to the RFC, am I asking the wrong question about LE64?
  2. Why is the update not done according to the RFC?
  3. Which version of the RFC are you referencing and implementing?

I always refer to the implementation here. Sorry for the detailed question, but thank you in advance.

tarcieri commented 4 months ago
  1. The arguments can be viewed many different ways. The RFC chose to model it as bytes. The reference implementation used an array of u64. If we were to follow the RFC to the letter and use bytes, the distinction would likely be optimized away by the compiler.
  2. Again, our implementation predates the RFC, and there's little benefit in updating it to match exactly why the RFC is doing. I can't speak to why the RFC chose to describe the interface that way, but bytes are not an ideal way to represent large numbers which ideally we'd be modeling using SIMD registers when available.
  3. The original Argon2 paper is here: https://www.password-hashing.net/argon2-specs.pdf
newpavlov commented 4 months ago

From the practical point of view, using [u64; N] usually allows better optimizations than [u8; 8 * N] because of higher alignment.

watason commented 4 months ago

@tarcieri @newpavlov

Thank you for taking the time to answer these amateur questions!

I have read the PDF, but I have a specific question regarding the handling of 64-bit values in little-endian format.

Is there a difference between a regular u64 and a little-endian u64? Do these differences affect the permutation P?

Referring to this part of the paper on page 6:

where
• r is the pass number; <- 8 bytes
• l is the lane number; <- 8 bytes
• s is the slice number; <- 8 bytes
• m0 is the total number of memory blocks;
• t is the total number of passes;
• x is the type of the Argon function (equals 1 for Argon2i);
• i is the counter starting in each segment from 1.
All the numbers are put as little-endian. We increase the counter so that each application of G2 gives 128 64-bit values J1||J2.

I understand that the permutation P is processed in u64. I also understand that it is handled in u64 arrays instead of bytes.

My apologies for my poor English.

In other words:

  1. If a 64-bit variable is not in little-endian format and is passed normally, does it affect the output of G?
  2. Does it affect security?

Thank you in advance.

newpavlov commented 4 months ago

The implementation is endianess-agnostic, i.e. it will produce the same result on BE and LE targets. We test the crate on both kinds. u64 does not have endianess per se, endianess matters during conversion u64 <-> [u8; 8]. We do it by using to_le_bytes/from_le_bytes methods (grep the crate source code to see their uses). On BE machines they causes additional byte swapping, but it's a somewhat inevitable cost with how the algorithm is specified.

watason commented 4 months ago

I see! So if you only put in the LE conversion when you BE, it will work optimally on that machine! I completely forgot about the actual environment.

Now I understand. Thank you so much!