Closed krystian-hebel closed 4 years ago
With GCC 6.3.0 from Debian, I do actually see an LTO improvement:
64:
add/remove: 1/0 grow/shrink: 0/1 up/down: 256/-4670 (-4414)
Function old new delta
K - 256 +256
sha256_update 7612 2942 -4670
Total: Before=60531, After=56117, chg -7.29%
32:
add/remove: 1/0 grow/shrink: 0/1 up/down: 256/-4561 (-4305)
Function old new delta
K - 256 +256
sha256_update 7649 3088 -4561
Total: Before=31577, After=27272, chg -13.63%
lto.64:
add/remove: 3/2 grow/shrink: 0/0 up/down: 3300/-8004 (-4704)
Function old new delta
sha256_update - 2942 +2942
K - 256 +256
BLEND_OP - 102 +102
BLEND_OP.lto_priv 102 - -102
sha256_update.lto_priv 7902 - -7902
Total: Before=60817, After=56113, chg -7.73%
lto.32:
add/remove: 3/2 grow/shrink: 0/0 up/down: 3461/-7825 (-4364)
Function old new delta
sha256_update - 3088 +3088
K - 256 +256
BLEND_OP - 117 +117
BLEND_OP.lto_priv 117 - -117
sha256_update.lto_priv 7708 - -7708
Total: Before=31381, After=27017, chg -13.91%
Interestingly, an extra 3 stack slots used (and the disassembly highlighting one of the most corner case optimisations I've ever seen a compiler make...)
1a89: 48 83 ec 68 sub $0x68,%rsp
vs
1a89: 48 83 c4 80 add $0xffffffffffffff80,%rsp
Either way, a massive improvement. Space is at a premium, and this will all fit in the L1 cache. (Its liable to be a little faster, as you're not causing the instruction decode to be the thing producing these constants.)
64: add/remove: 1/0 grow/shrink: 0/1 up/down: 256/-4490 (-4234) Function old new delta K - 256 +256 sha256_update 7527 3037 -4490 Total: Before=60371, After=56137, chg -7.01%
32: add/remove: 1/0 grow/shrink: 0/1 up/down: 256/-4621 (-4365) Function old new delta K - 256 +256 sha256_update 7646 3025 -4621 Total: Before=31566, After=27201, chg -13.83%
Deltas are the same with and without LTO.
Signed-off-by: Krystian Hebel krystian.hebel@3mdeb.com