aadomn / aes

Fast constant-time AES implementations on 32-bit architectures
MIT License
61 stars 5 forks source link

ROR/BYTE_ROR_n calls in mixcolumns_n #2

Open peterdettman opened 2 years ago

peterdettman commented 2 years ago

Hi @aadomn,

Recalling this observation and your paper, specifically Figure 6 and the following paragraph on page 8.

The paper gives a count of 27 XOR, 32 AND and 16 OR instructions on top of 16 circular and 32 logical shifts, where the ANDs, ORs and shifts come from 16 instances of the form ROR(BYTE_ROR_n, m) (per your C code in this repo.).

Actually in the rust code, we use just 32 circular shifts, 2 for each rotate_rows_and_columns_m_n call.

Essentially the outer ROR call is merged into the two shifts inside BYTE_ROR_n (which become circular shifts). Assuming ROR is converted by the compiler to a single rotate instruction, then there are 16 instructions to be saved here.

I'm not sure this would make much difference where there's a barrel shifter, but for the general case it may be worth reporting.

aadomn commented 2 years ago

Hi @peterdettman,

Thank you for pointing this out.

Indeed one can use 2 circular shifts instead of 1 circular shift + 2 logical shifts. As you suggest, it would have no impact on ARM because of the barrel shifter but might lead to improvements on other platforms. On the other hand, it can also lead to a performance decrease if no rotate instruction is available (as in the RV32I instruction set we considered in our paper for benchmarks on RISC-V microcontroller).

Anyway I agree it is worth clearly mentioning it in the code/paper for the sake of completeness. I will take care of it.

Many thanks for your feedback once again!