Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Duplicate constants in stack lowering #36272

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR37299
Status NEW
Importance P enhancement
Reported by kripken (alonzakai@gmail.com)
Reported on 2018-04-30 20:01:11 -0700
Last modified on 2018-05-31 14:44:27 -0700
Version trunk
Hardware PC Linux
CC jgravelle@google.com, llvm-bugs@lists.llvm.org, llvm@sunfishcode.online
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
The Souper superoptimizer finds stuff like this in the wasm backend's output,

; start LHS (in $luaL_error)
%0:i32 = var
; (i32.sub(get_local $5)(i32.const 128))
%1 = sub %0, 128:i32
; (i32.add(get_local $3)(i32.const 128))
%2 = add %1, 128:i32
infer %2

which corresponds to wasm like this

 (func $luaL_error (; 41 ;) (type $3) (param $var$0 i32) (param $var$1 i32) (param $var$2 i32) (result i32)
  (local $var$3 i32)
  (local $var$4 i32)
  (i32.store
   (i32.const 1024)
   (tee_local $var$3
    (i32.sub
     (i32.load
      (i32.const 1024)
     )
     (i32.const 128)
    )
   )
  )
  [..]
  (i32.store
   (i32.const 1024)
   (i32.add
    (get_local $var$3)
    (i32.const 128)
   )
  )
  (get_local $var$0)
 )

So it subs 128 from the stack pointer, then adds 128 to restore it. An
alternative would be to use a local to save the original value instead, which
would save a few bytes (avoid the add + constant at the end), although I'm not
sure how that would affect the size of the uses of the stack pointer in the
middle there.
Quuxplusone commented 6 years ago

We could do that, but it would be a tradeoff of register pressure for code size. In general that's probably not worth doing?

It shouldn't have an effect on the uses of the stack pointer for the rest of the function though.

We have logic to save stack frame and base pointers into locals when we need them, which is in the presence of VLAs and over-aligned allocations. If we do emit a frame pointer, our epilog just resets the stack pointer to the value of the frame pointer. In theory we could add "optimize for code size" as an input into making that decision, though I'm not sure if that's plumbed through to the backend.