ARM-software / LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for Arm and AArch64 embedded targets.
Apache License 2.0
425 stars 98 forks source link

[Pipelines] Additional unrolling in LTO #536

Closed VladiKrapp-Arm closed 1 month ago

VladiKrapp-Arm commented 1 month ago

Some workloads require specific sequences of events to happen to fully simplify. This adds an extra full unrolling pass to help these cases on the cores with branch predictors. It helps produce simplified loops, which can then be SROA'd allowing further simplification, which can be important for performance. Feature adds extra compile time to get extra performance and is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default).

Original patch by David Green (david.green@arm.com)

dcandler commented 1 month ago

Very minor nit: the first patch should start 0001, i.e. 0001-LTOpasses-add-loop-unroll.patch to match the other folder and the likely output of git format-patch The placeholder patch was numbered 0 since it's not a real patch file.

VladiKrapp-Arm commented 1 month ago

@stuij , @dcandler , I have made the suggested changes. Any other issues?