Missing multiple store to single load forwarding


Bugzilla Link	PR52568
Status	NEW
Importance	P enhancement
Reported by	Nikita Popov (nikita.ppv@gmail.com)
Reported on	2021-11-20 01:15:09 -0800
Last modified on	2021-11-20 13:03:42 -0800
Version	trunk
Hardware	PC Linux
CC	llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

The middle end optimizer is currently not able to forward multiple small stores
to a single large load at -O3 (https://llvm.godbolt.org/z/bxE1eKMK9):

define i64 @test(ptr %p) {
  store i32 0, ptr %p
  %p2 = getelementptr i8, ptr %p, i64 4
  store i32 1, ptr %p2
  %v = load i64, ptr %p
  ret i64 %v
}

In my specific use case the large load actually gets decomposed back down to
the two parts using trunc/lshr:

define i32 @test2(ptr %p, ptr %p.out) {
  store i32 0, ptr %p
  %p2 = getelementptr i8, ptr %p, i64 4
  store i32 1, ptr %p2
  %v = load i64, ptr %p
  %v1 = trunc i64 %v to i32
  store i32 %v1, ptr %p.out
  %v2 = lshr i64 %v, 32
  %v3 = trunc i64 %v2 to i32
  ret i32 %v3
}

I'm mentioning this because this case is potentially easier to solve, as one
could effectively forward directly to the trunc.

We do fold these at the DAG level, but by then it is too late to apply cross-BB
followup optimizations.

Quuxplusone / LLVMBugzillaTest

Missing multiple store to single load forwarding #51535