Load widening for higher aligned pointer incorrectly reported as undefined

arsenm commented 2 months ago

load with an alignment higher than the base value implies dereferenceable at the load point:

"An alignment value higher than the size of the loaded type implies memory up to the alignment value bytes can be safely loaded without trapping in the default address space. "

define i16 @src(ptr align 4 %ptr) {
  %load = load i16, ptr %ptr, align 4
  ret i16 %load
}

define i16 @tgt(ptr align 4 %ptr) {
  %load = load i32, ptr %ptr, align 4
  %trunc = trunc i32 %load to i16
  ret i16 %trunc
}

This is incorrectly rejected:

----------------------------------------
define i16 @src(ptr align(4) %ptr) {
#0:
  %load = load i16, ptr align(4) %ptr, align 4
  ret i16 %load
}
=>
define i16 @tgt(ptr align(4) %ptr) {
#0:
  %load = load i32, ptr align(4) %ptr, align 4
  %trunc = trunc i32 %load to i16
  ret i16 %trunc
}
Transformation doesn't verify!

ERROR: Source is more defined than target

Example:
ptr align(4) %ptr = pointer(non-local, block_id=1, offset=0)

Source:
i16 %load = poison

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >   size: 0 align: 2    alloc type: 0   alive: false    address: 0
Block 1 >   size: 2 align: 1    alloc type: 0   alive: true address: 4

Target:
i32 %load = UB triggered!

nunoplopes commented 2 months ago

FWIW, the optimization is incorrect because widening may end up reading poison bytes. But the counterexample is wrong; it should complain about tgt being poison, not UB. We support this transformation in assembly mode; I didn't realize it was being done in an IR optimization already.

arsenm commented 2 months ago

We have some AMDGPU widening at CodeGenPrepare time already. I found this while trying to emulate atomicrmw expansion (also in IR)

AliveToolkit / alive2

Load widening for higher aligned pointer incorrectly reported as undefined #1028