ROR/BYTE_ROR_n calls in mixcolumns_n

aadomn / aes

Fast constant-time AES implementations on 32-bit architectures

MIT License

61 stars 5 forks source link

Hi @aadomn,

Recalling this observation and your paper, specifically Figure 6 and the following paragraph on page 8.

The paper gives a count of 27 XOR, 32 AND and 16 OR instructions on top of 16 circular and 32 logical shifts, where the ANDs, ORs and shifts come from 16 instances of the form ROR(BYTE_ROR_n, m) (per your C code in this repo.).

Actually in the rust code, we use just 32 circular shifts, 2 for each rotate_rows_and_columns_m_n call.

Essentially the outer ROR call is merged into the two shifts inside BYTE_ROR_n (which become circular shifts). Assuming ROR is converted by the compiler to a single rotate instruction, then there are 16 instructions to be saved here.

I'm not sure this would make much difference where there's a barrel shifter, but for the general case it may be worth reporting.

aadomn / aes

ROR/BYTE_ROR_n calls in mixcolumns_n #2