Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Missing multiple store to single load forwarding #51535

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR52568
Status NEW
Importance P enhancement
Reported by Nikita Popov (nikita.ppv@gmail.com)
Reported on 2021-11-20 01:15:09 -0800
Last modified on 2021-11-20 13:03:42 -0800
Version trunk
Hardware PC Linux
CC llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
The middle end optimizer is currently not able to forward multiple small stores
to a single large load at -O3 (https://llvm.godbolt.org/z/bxE1eKMK9):

define i64 @test(ptr %p) {
  store i32 0, ptr %p
  %p2 = getelementptr i8, ptr %p, i64 4
  store i32 1, ptr %p2
  %v = load i64, ptr %p
  ret i64 %v
}

In my specific use case the large load actually gets decomposed back down to
the two parts using trunc/lshr:

define i32 @test2(ptr %p, ptr %p.out) {
  store i32 0, ptr %p
  %p2 = getelementptr i8, ptr %p, i64 4
  store i32 1, ptr %p2
  %v = load i64, ptr %p
  %v1 = trunc i64 %v to i32
  store i32 %v1, ptr %p.out
  %v2 = lshr i64 %v, 32
  %v3 = trunc i64 %v2 to i32
  ret i32 %v3
}

I'm mentioning this because this case is potentially easier to solve, as one
could effectively forward directly to the trunc.

We do fold these at the DAG level, but by then it is too late to apply cross-BB
followup optimizations.