The current implementation of gen_matrix, given below, is incorrect. The problem is in the case that the initial BUFLEN=504 bytes squeezed from the XOF are not enough. Then only a single extra block is squeezed from the XOF, but the whole buffer is used for rejection sampling.
Compare your Rust implementation:
fn gen_matrix(a: &mut [Polyvec], seed: &[u8], transposed: bool)
{
let mut ctr;
// 530 is expected number of required bytes
const GEN_MATRIX_NBLOCKS: usize =
(12*KYBER_N/8*(1 << 12)/KYBER_Q + XOF_BLOCKBYTES)/XOF_BLOCKBYTES;
const BUFLEN: usize = GEN_MATRIX_NBLOCKS*XOF_BLOCKBYTES;
let mut buf = [0u8; BUFLEN+2];
let mut off: usize;
let mut state = XofState::new();
for i in 0..KYBER_K {
for j in 0..KYBER_K {
if transposed {
xof_absorb(&mut state, seed, i as u8, j as u8);
}
else {
xof_absorb(&mut state, seed, j as u8, i as u8);
}
xof_squeezeblocks(&mut buf, GEN_MATRIX_NBLOCKS, &mut state);
ctr = rej_uniform(&mut a[i].vec[j].coeffs, KYBER_N, &buf, BUFLEN);
while ctr < KYBER_N
{
off = BUFLEN % 3;
for k in 0..off {
buf[k] = buf[BUFLEN - off + k];
}
xof_squeezeblocks(&mut buf[off..], 1, &mut state);
ctr += rej_uniform(&mut a[i].vec[j].coeffs[ctr..], KYBER_N - ctr, &buf, BUFLEN);
}
}
}
}
To the reference implementation where buflen is correctly adjusted:
The current implementation of
gen_matrix
, given below, is incorrect. The problem is in the case that the initialBUFLEN
=504 bytes squeezed from the XOF are not enough. Then only a single extra block is squeezed from the XOF, but the whole buffer is used for rejection sampling.Compare your Rust implementation:
To the reference implementation where
buflen
is correctly adjusted:The issue for regular Kyber occurs with probability approximately 2^-105.