ARM-software / LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for Arm and AArch64 embedded targets.
Apache License 2.0
419 stars 98 forks source link

[Pipelines] Additional unrolling in LTO #536

Closed VladiKrapp-Arm closed 4 weeks ago

VladiKrapp-Arm commented 1 month ago

Some workloads require specific sequences of events to happen to fully simplify. This adds an extra full unrolling pass to help these cases on the cores with branch predictors. It helps produce simplified loops, which can then be SROA'd allowing further simplification, which can be important for performance. Feature adds extra compile time to get extra performance and is enabled by the opt flag 'extra-LTO-loop-unroll' (off by default).

Original patch by David Green (david.green@arm.com)

dcandler commented 1 month ago

Very minor nit: the first patch should start 0001, i.e. 0001-LTOpasses-add-loop-unroll.patch to match the other folder and the likely output of git format-patch The placeholder patch was numbered 0 since it's not a real patch file.

VladiKrapp-Arm commented 4 weeks ago

@stuij , @dcandler , I have made the suggested changes. Any other issues?