bytecodealliance / regalloc2

A new register allocator
Apache License 2.0
217 stars 38 forks source link

looks like a alias bug! #46

Closed yuyang-ok closed 2 years ago

yuyang-ok commented 2 years ago

TRACE - Parsed run command: run: %atomic_cas_little_i16(305419896, 2, 4660, -21555) == -1412606344
TRACE - ABI: func signature Signature { params: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I64, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }], returns: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }], call_conv: WindowsFastcall }
TRACE - ABISig: sig Signature { params: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I64, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }, AbiParam { value_type: types::I16, purpose: Normal, extension: None, legalized_to_pointer: false }], returns: [AbiParam { value_type: types::I32, purpose: Normal, extension: None, legalized_to_pointer: false }], call_conv: WindowsFastcall } => args = [Slots { slots: [Reg { reg: p10i, ty: types::I32, extension: None }], purpose: Normal }, Slots { slots: [Reg { reg: p11i, ty: types::I64, extension: None }], purpose: Normal }, Slots { slots: [Reg { reg: p12i, ty: types::I16, extension: None }], purpose: Normal }, Slots { slots: [Reg { reg: p13i, ty: types::I16, extension: None }], purpose: Normal }] rets = [Slots { slots: [Reg { reg: p10i, ty: types::I32, extension: None }], purpose: Normal }] arg stack = 0 ret stack = 0 stack_ret_arg = None
TRACE - BlockLoweringOrder: function body function %atomic_cas_littl(i32, i64, i16, i16) -> i32 windows_fastcall {
    ss0 = explicit_slot 4

block0(v0: i32, v1: i64, v2: i16, v3: i16):
    v4 = stack_addr.i64 ss0
    store little v0, v4
    v5 = iadd v4, v1
    v6 = atomic_cas v5, v2, v3
    v7 = load.i32 little v4
    return v7
}

TRACE - BlockLoweringOrder: BlockLoweringOrder { lowered_order: [Orig { block: block0 }], lowered_succs: [], lowered_succ_indices: [], lowered_succ_ranges: [(0, 0)], orig_map: SecondaryMap { elems: [Some(Block(0))], default: None, unused: PhantomData }, cold_blocks: {} }
TRACE - bb block0 param v0: regs ValueRegs { parts: [v128, v2097151] }
TRACE - bb block0 param v1: regs ValueRegs { parts: [v129, v2097151] }
TRACE - bb block0 param v2: regs ValueRegs { parts: [v130, v2097151] }
TRACE - bb block0 param v3: regs ValueRegs { parts: [v131, v2097151] }
TRACE - bb block0 inst inst0 (StackLoad { opcode: StackAddr, stack_slot: ss0, offset: Offset32(0) }): result v4 regs ValueRegs { parts: [v132, v2097151] }
TRACE - bb block0 inst inst2 (Binary { opcode: Iadd, args: [v4, v1] }): result v5 regs ValueRegs { parts: [v133, v2097151] }
TRACE - bb block0 inst inst3 (AtomicCas { opcode: AtomicCas, args: [v5, v2, v3], flags: MemFlags { bits: 8 } }): result v6 regs ValueRegs { parts: [v134, v2097151] }
TRACE - bb block0 inst inst4 (Load { opcode: Load, arg: v4, flags: MemFlags { bits: 8 }, offset: Offset32(0) }): result v7 regs ValueRegs { parts: [v135, v2097151] }
TRACE - retval gets regs ValueRegs { parts: [v136, v2097151] }
TRACE - bb block0 inst inst0 has color 1
TRACE - bb block0 inst inst1 has color 1
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - bb block0 inst inst2 has color 2
TRACE - bb block0 inst inst3 has color 2
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - bb block0 inst inst4 has color 3
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - bb block0 inst inst5 has color 4
TRACE -  -> side-effecting; incrementing color for next inst
TRACE - arg v0 used, old state Unused, new Once
TRACE - arg v4 used, old state Unused, new Once
TRACE - arg v4 used, old state Once, new Multiple
TRACE -  -> pushing args for v4 onto stack
TRACE - arg v1 used, old state Unused, new Once
TRACE - arg v5 used, old state Unused, new Once
TRACE - arg v2 used, old state Unused, new Once
TRACE - arg v3 used, old state Unused, new Once
TRACE - arg v4 used, old state Multiple, new Multiple
TRACE - arg v7 used, old state Unused, new Once
DEBUG - timing: Starting VCode lowering, (during Processing test file)
TRACE - about to lower function: function %atomic_cas_littl(i32, i64, i16, i16) -> i32 windows_fastcall {
    ss0 = explicit_slot 4

block0(v0: i32, v1: i64, v2: i16, v3: i16):
    v4 = stack_addr.i64 ss0
    store little v0, v4
    v5 = iadd v4, v1
    v6 = atomic_cas v5, v2, v3
    v7 = load.i32 little v4
    return v7
}

TRACE - lower_clif_block: block block0 inst inst5 (MultiAry { opcode: Return, args: EntityList { index: 25, unused: PhantomData } }) is_branch false side_effect true value_needed false
TRACE - lowering: inst inst5: MultiAry { opcode: Return, args: EntityList { index: 25, unused: PhantomData } }
TRACE - get_input_for_val: val v7 at cur_inst Some(inst5) cur_scan_entry_color Some(InstColor(4))
TRACE -  -> src inst inst4
TRACE -  -> has lowering side effect: true
TRACE -  -> side-effecting op inst4 for val v7: use state Once
TRACE - put_value_in_regs: val v7
TRACE -  -> regs ValueRegs { parts: [v135, v2097151] }
TRACE - emit: Mov { rd: Writable { reg: v136 }, rm: v135, ty: types::I32 }
TRACE - emit: Mov { rd: Writable { reg: p10i }, rm: v136, ty: types::I32 }
TRACE - emit: Ret
TRACE - lower_clif_block: block block0 inst inst4 (Load { opcode: Load, arg: v4, flags: MemFlags { bits: 8 }, offset: Offset32(0) }) is_branch false side_effect true value_needed true
TRACE - lowering: inst inst4: Load { opcode: Load, arg: v4, flags: MemFlags { bits: 8 }, offset: Offset32(0) }
TRACE - get_input_for_val: val v4 at cur_inst Some(inst4) cur_scan_entry_color Some(InstColor(3))
TRACE -  -> src inst inst0
TRACE -  -> has lowering side effect: false
TRACE - put_value_in_regs: val v4
TRACE -  -> regs ValueRegs { parts: [v132, v2097151] }
TRACE - emit: Load { rd: Writable { reg: v135 }, op: Lw, flags: MemFlags { bits: 8 }, from: RegOffset(v132, 0, types::I64) }
TRACE - lower_clif_block: block block0 inst inst3 (AtomicCas { opcode: AtomicCas, args: [v5, v2, v3], flags: MemFlags { bits: 8 } }) is_branch false side_effect true value_needed false
TRACE - lowering: inst inst3: AtomicCas { opcode: AtomicCas, args: [v5, v2, v3], flags: MemFlags { bits: 8 } }
TRACE - put_value_in_regs: val v5
TRACE -  -> regs ValueRegs { parts: [v133, v2097151] }
TRACE - put_value_in_regs: val v2
TRACE -  -> regs ValueRegs { parts: [v130, v2097151] }
TRACE - put_value_in_regs: val v3
TRACE -  -> regs ValueRegs { parts: [v131, v2097151] }
TRACE - emit: AtomicCas { t0: Writable { reg: v137 }, dst: Writable { reg: v134 }, e: v130, addr: v133, v: v131, ty: types::I16 }
TRACE - lower_clif_block: block block0 inst inst2 (Binary { opcode: Iadd, args: [v4, v1] }) is_branch false side_effect false value_needed true
TRACE - lowering: inst inst2: Binary { opcode: Iadd, args: [v4, v1] }
TRACE - put_value_in_regs: val v4
TRACE -  -> regs ValueRegs { parts: [v132, v2097151] }
TRACE - put_value_in_regs: val v1
TRACE -  -> regs ValueRegs { parts: [v129, v2097151] }
TRACE - emit: AluRRR { alu_op: Add, rd: Writable { reg: v138 }, rs1: v132, rs2: v129 }
TRACE - set vreg alias: from v133 to v138
TRACE - lower_clif_block: block block0 inst inst1 (Store { opcode: Store, args: [v0, v4], flags: MemFlags { bits: 8 }, offset: Offset32(0) }) is_branch false side_effect true value_needed false
TRACE - lowering: inst inst1: Store { opcode: Store, args: [v0, v4], flags: MemFlags { bits: 8 }, offset: Offset32(0) }
TRACE - get_input_for_val: val v0 at cur_inst Some(inst1) cur_scan_entry_color Some(InstColor(1))
TRACE - put_value_in_regs: val v0
TRACE -  -> regs ValueRegs { parts: [v128, v2097151] }
TRACE - get_input_for_val: val v4 at cur_inst Some(inst1) cur_scan_entry_color Some(InstColor(1))
TRACE -  -> src inst inst0
TRACE -  -> has lowering side effect: false
TRACE - put_value_in_regs: val v4
TRACE -  -> regs ValueRegs { parts: [v132, v2097151] }
TRACE - emit: Store { to: RegOffset(v132, 0, types::I64), op: Sw, flags: MemFlags { bits: 8 }, src: v128 }
TRACE - lower_clif_block: block block0 inst inst0 (StackLoad { opcode: StackAddr, stack_slot: ss0, offset: Offset32(0) }) is_branch false side_effect false value_needed true
TRACE - lowering: inst inst0: StackLoad { opcode: StackAddr, stack_slot: ss0, offset: Offset32(0) }
TRACE - emit: LoadAddr { rd: Writable { reg: v132 }, mem: NominalSPOffset(0, types::I8) }
TRACE - gen_arg_setup: entry BB block0 args are:
[v0, v1, v2, v3]
TRACE - emit: Mov { rd: Writable { reg: v128 }, rm: p10i, ty: types::I32 }
TRACE - emit: Mov { rd: Writable { reg: v129 }, rm: p11i, ty: types::I64 }
TRACE - emit: Mov { rd: Writable { reg: v130 }, rm: p12i, ty: types::I16 }
TRACE - emit: Mov { rd: Writable { reg: v131 }, rm: p13i, ty: types::I16 }
TRACE - gen_retval_area_setup: not needed
TRACE - built vcode: VCode {
  Entry block: 0
  v133 := v138
Block 0:
    (original IR block: block0)
    (instruction range: 0 .. 12)
  Inst 0: mov v128,a0
  Inst 1: mov v129,a1
  Inst 2: mov v130,a2
  Inst 3: mov v131,a3
  Inst 4: load_addr v132,0(nominal_sp)
  Inst 5: sw v128,0(v132)
  Inst 6: add v138,v132,v129
  Inst 7: atomic_cas v134,v130,v131,(v133);; t0=v137
  Inst 8: lw v135,0(v132)
  Inst 9: mov v136,v135
  Inst 10: mov a0,v136
  Inst 11: ret
}

DEBUG - timing: Ending VCode lowering
TRACE - vcode from lowering:
VCode {
  Entry block: 0
  v133 := v138
Block 0:
    (original IR block: block0)
    (instruction range: 0 .. 12)
  Inst 0: mov v128,a0
  Inst 1: mov v129,a1
  Inst 2: mov v130,a2
  Inst 3: mov v131,a3
  Inst 4: load_addr v132,0(nominal_sp)
  Inst 5: sw v128,0(v132)
  Inst 6: add v138,v132,v129
  Inst 7: atomic_cas v134,v130,v131,(v133);; t0=v137
  Inst 8: lw v135,0(v132)
  Inst 9: mov v136,v135
  Inst 10: mov a0,v136
  Inst 11: ret
}

DEBUG - timing: Starting Register allocation, (during Processing test file)
INFO - === REGALLOC RESULTS ===
INFO - block0: [succs [] preds []]
INFO -   inst0-pre:  <<< start v2 in p2i (range10) (bundle4294967295)
INFO -   inst0-pre:  <<< start v11 in p11i (range13) (bundle4294967295)
INFO -   inst0-pre:  <<< start v12 in p12i (range12) (bundle4294967295)
INFO -   inst0-pre:  <<< start v13 in p13i (range11) (bundle4294967295)
INFO -   inst0-pre:  <<< start v128 in p10i (range9) (bundle0)
INFO -   inst0-pre:  <<< start v129 in p5i (range14) (bundle13)
INFO -   inst0-pre:  <<< start v130 in p28i (range15) (bundle11)
INFO -   inst0-pre:  <<< start v131 in p7i (range16) (bundle15)
INFO -   inst0: op Def: v128i reg [none], Use: v10i reg [none]
INFO -   inst0-post:      end   v129 in p5i (range14) (bundle13) >>>
INFO -   inst0-post:      end   v130 in p28i (range15) (bundle11) >>>
INFO -   inst0-post:      end   v131 in p7i (range16) (bundle15) >>>
INFO -   inst1-pre:      end   v11 in p11i (range13) (bundle4294967295) >>>
INFO -   inst1: op Def: v129i reg [none], Use: v11i reg [none]
INFO -   inst1-post:  <<< start v129 in p11i (range8) (bundle12)
INFO -   inst2-pre:      end   v12 in p12i (range12) (bundle4294967295) >>>
INFO -   inst2: op Def: v130i reg [none], Use: v12i reg [none]
INFO -   inst2-post:  <<< start v130 in p12i (range5) (bundle10)
INFO -   inst3-pre:      end   v13 in p13i (range11) (bundle4294967295) >>>
INFO -   inst3: op Def: v131i reg [none], Use: v13i reg [none]
INFO -   inst3-post:  <<< start v131 in p13i (range7) (bundle14)
INFO -   inst4: op Def: v132i reg [p15i], Use: v2i reg [p2i]
INFO -   inst4-post:      end   v2 in p2i (range10) (bundle4294967295) >>>
INFO -   inst4-post:  <<< start v132 in p15i (range2) (bundle4)
INFO -   inst5: op Use: v132i reg [p15i], Use: v128i reg [p10i]
INFO -   inst5-post:      end   v128 in p10i (range9) (bundle0) >>>
INFO -   inst6: op Def: v138i reg [p7i], Use: v132i reg [p15i], Use: v129i reg [p11i]
INFO -   inst6-post:      end   v129 in p11i (range8) (bundle12) >>>
INFO -   inst6-post:  <<< start v138 in p7i (range6) (bundle9)
INFO -   inst7: op Def: v137i reg [p7i], Def: v134i reg [p29i], Use: v130i reg [p12i], Use: v138i reg [p7i], Use: v131i reg [p13i]
INFO -   inst7-post:      end   v130 in p12i (range5) (bundle10) >>>
INFO -   inst7-post:      end   v131 in p13i (range7) (bundle14) >>>
INFO -   inst7-post:  <<< start v134 in p29i (range4) (bundle5)
INFO -   inst7-post:  <<< start v137 in p7i (range3) (bundle8)
INFO -   inst7-post:      end   v138 in p7i (range6) (bundle9) >>>
INFO -   inst8-pre:      end   v134 in p29i (range4) (bundle5) >>>
INFO -   inst8-pre:      end   v137 in p7i (range3) (bundle8) >>>
INFO -   inst8: op Use: v132i reg [p15i], Def: v135i reg [p10i]
INFO -   inst8-post:      end   v132 in p15i (range2) (bundle4) >>>
INFO -   inst8-post:  <<< start v135 in p10i (range1) (bundle6)
INFO -   inst9: op Def: v136i reg [none], Use: v135i reg [none]
INFO -   inst9-post:  prog-move v135 (Any) -> v136 (Any)
INFO -   inst10-pre:      end   v135 in p10i (range1) (bundle6) >>>
INFO -   inst10-pre:  <<< start v136 in p10i (range0) (bundle6)
INFO -   inst10: op Def: v10i reg [none], Use: v136i reg [none]
INFO -   inst11-pre:      end   v136 in p10i (range0) (bundle6) >>>
INFO -   inst11: ret
DEBUG - timing: Ending Register allocation
DEBUG - timing: Starting VCode emission, (during Processing test file)
TRACE - MachBuffer: first 1 labels are for blocks
TRACE - MachBuffer: next 0 labels are for constants
TRACE - emitting block Block(0)
TRACE -  -> entry block
TRACE - MachBuffer: put 32-bit word @ 0: ff810113
TRACE - MachBuffer: put 32-bit word @ 4: 813023
TRACE - MachBuffer: put 32-bit word @ 8: 16413
TRACE - MachBuffer: put 32-bit word @ 12: ff010113
TRACE - MachBuffer: bind label MachLabel(0) at offset 16
TRACE - enter optimize_branches:
 b = []
 l = [MachLabel(0)]
 f = []
TRACE - leave optimize_branches:
 b = []
 l = [MachLabel(0)]
 f = []
TRACE - MachBuffer: put 32-bit word @ 16: 10793
TRACE - MachBuffer: put 32-bit word @ 20: a7a023
TRACE - MachBuffer: put 32-bit word @ 24: b783b3
TRACE - MachBuffer: new label -> MachLabel(1)
TRACE - MachBuffer: new label -> MachLabel(2)
TRACE - MachBuffer: bind label MachLabel(2) at offset 28
TRACE - enter optimize_branches:
 b = []
 l = [MachLabel(2)]
 f = []
TRACE - leave optimize_branches:
 b = []
 l = [MachLabel(2)]
 f = []
TRACE - MachBuffer: put 32-bit word @ 28: 1403a3af
TRACE - MachBuffer: put data @ 32: len 4
TRACE - MachBuffer: use_label_at_offset: offset 36 label MachLabel(1) kind Jal20
TRACE - MachBuffer: put 32-bit word @ 36: 6f
TRACE - MachBuffer: put 32-bit word @ 40: 1ad3aeaf
TRACE - MachBuffer: use_label_at_offset: offset 44 label MachLabel(2) kind B12
TRACE - MachBuffer: put 32-bit word @ 44: de9063
TRACE - MachBuffer: bind label MachLabel(1) at offset 48
TRACE - enter optimize_branches:
 b = [MachBranch { start: 36, end: 40, target: MachLabel(1), fixup: 0, inverted: None, labels_at_this_branch: [] }, MachBranch { start: 44, end: 48, target: MachLabel(2), fixup: 1, inverted: Some([99, 128, 222, 0]), labels_at_this_branch: [] }]
 l = [MachLabel(1)]
 f = [MachLabelFixup { label: MachLabel(1), offset: 36, kind: Jal20 }, MachLabelFixup { label: MachLabel(2), offset: 44, kind: B12 }]
TRACE - optimize_branches: last branch MachBranch { start: 44, end: 48, target: MachLabel(2), fixup: 1, inverted: Some([99, 128, 222, 0]), labels_at_this_branch: [] } at off 48
TRACE - leave optimize_branches:
 b = [MachBranch { start: 36, end: 40, target: MachLabel(1), fixup: 0, inverted: None, labels_at_this_branch: [] }, MachBranch { start: 44, end: 48, target: MachLabel(2), fixup: 1, inverted: Some([99, 128, 222, 0]), labels_at_this_branch: [] }]
 l = [MachLabel(1)]
 f = [MachLabelFixup { label: MachLabel(1), offset: 36, kind: Jal20 }, MachLabelFixup { label: MachLabel(2), offset: 44, kind: B12 }]
TRACE - MachBuffer: put 32-bit word @ 48: 7a503
TRACE - Epilogue: [AjustSp { amount: 16 }, Load { rd: Writable { reg: p8i }, op: Ld, flags: MemFlags { bits: 3 }, from: SPOffset(0, types::I64) }, AjustSp { amount: 8 }, Ret]
TRACE - MachBuffer: put 32-bit word @ 52: 1010113
TRACE - MachBuffer: put 32-bit word @ 56: 13403
TRACE - MachBuffer: put 32-bit word @ 60: 810113
TRACE - MachBuffer: put 32-bit word @ 64: 8067
DEBUG - timing: Ending VCode emission
DEBUG - timing: Starting VCode emission finalization, (during Processing test file)
TRACE - emit_island: fixup MachLabelFixup { label: MachLabel(1), offset: 36, kind: Jal20 }
TRACE -  -> label_offset = 48, known, required = false (pos 2097150 neg 2097152)
TRACE - patching in-range!
TRACE - emit_island: fixup MachLabelFixup { label: MachLabel(2), offset: 44, kind: B12 }
TRACE -  -> label_offset = 28, known, required = false (pos 8190 neg 8192)
TRACE - patching in-range!
DEBUG - timing: Ending VCode emission finalization
INFO - compiler code:  addi sp,-8
  sd fp,0(sp)
  mov fp,sp
  addi sp,-16
block0:
  load_addr a5,0(nominal_sp)
  sw a0,0(a5)
  add t2,a5,a1
  atomic_cas t4,a2,a3,(t2);; t0=t2
  lw a0,0(a5)
  addi sp,16
  ld fp,0(sp)
  addi sp,8
  ret

the alias v133 := v138 cause v133 def and not use ,liferange been shorted to it's def,I have v133,c137,v138 go to the same register "t2" .

cfallin commented 2 years ago

@yuyang-ok this is a bug in the metadata that your backend is providing to the register allocator, I think. Take a look at this line in the RA2 input dump:

inst7: op Def: v137i reg [p7i], Def: v134i reg [p29i], Use: v130i reg [p12i], Use: v138i reg [p7i], Use: v131i reg [p13i]

this is showing that RA2 believes that v137 is a def, and v138 is a use. Defs ordinarily happen at the "late" point of an instruction, and uses happen at the "early" point. So they can both be assigned the same register, because for a normal single instruction, writing results to registers happens after reads of inputs.

For pseudoinstructions that lower to multiple machine instructions and use temporaries internally, though, usually what you want is for the temp to not conflict with inputs or outputs. In this case you can define the temp as an "early def", which guarantees it will not be allocated into the same register as any input.

Similarly, if your machine sequence reads an input after it starts writing any outputs, you will want to define that input as a "late use".

The OperandCollector trait provides a method to define early defs. You can see an example of its use for a temporary register here.

So this is not a bug in Cranelift or RA2, but the APIs are definitely subtle!

One final thing: please do not copy and paste the entire trace log into an issue, and please put a short summary of the problem before dumping a huge amount of detail. Doing these things makes it much, much easier to understand the issue and help you. I made the same request earlier in bytecodealliance/wasmtime#4110 and it really is important; thank you!

yuyang-ok commented 2 years ago

ok,I thought more information should helpful.