Open rakudrama opened 2 years ago
I have noticed this before when looking at another issue. The actual fix is relatively simple (taken from another patch I made: https://gist.github.com/mraleph/30183ef67352c6d5aec604d82f34e1be)
@@ -7100,7 +7100,7 @@ LocationSummary* BinaryUint32OpInstr::MakeLocationSummary(Zone* zone,
LocationSummary* summary = new (zone)
LocationSummary(zone, kNumInputs, kNumTemps, LocationSummary::kNoCall);
summary->set_in(0, Location::RequiresRegister());
- summary->set_in(1, Location::RequiresRegister());
+ summary->set_in(1, LocationRegisterOrConstant(right()));
summary->set_out(0, Location::SameAsFirstInput());
return summary;
}
@@ -7136,6 +7136,15 @@ static void EmitIntegerArithmetic(FlowGraphCompiler* compiler,
void BinaryUint32OpInstr::EmitNativeCode(FlowGraphCompiler* compiler) {
Register left = locs()->in(0).reg();
+ if (locs()->in(1).IsConstant()) {
+ int64_t value;
+ const bool ok = compiler::HasIntegerValue(locs()->in(1).constant(), &value);
+ RELEASE_ASSERT(ok);
+ EmitIntegerArithmetic(compiler, op_kind(), left,
+ compiler::Immediate(value));
+ return;
+ }
+
Register right = locs()->in(1).reg();
Register out = locs()->out(0).reg();
ASSERT(out == left);
@mraleph If I try something like that, I get immediate operands, but there are still a lot of unnecessary moves:
Code for optimized function 'package:kernel/ast.dart_Procedure_get_isNonNullableByDefault' (GetterFunction) {
;; B0
;; B1
;; ParallelMove rcx <- S+1
0x7f7bb9e9ed30 488b4c2408 movq rcx,[rsp+0x8]
;; v3 <- LoadField(v2 . flags) [-9223372036854775808, 9223372036854775807] T{int}
;; NativeUnboxedLoadFieldInstr
0x7f7bb9e9ed35 488b515f movq rdx,[rcx+0x5f]
;; ParallelMove rdx <- rdx
;; v21 <- IntConverter(int64->uint32[tr], v3)
0x7f7bb9e9ed39 8bd2 movl rdx,rdx
;; ParallelMove rdx <- rdx
;; v6 <- BinaryUint32Op(& [tr], v21 T{int}, v25) [0, 64] T{_Smi}
0x7f7bb9e9ed3b 83e240 andl rdx,0x40
;; ParallelMove rdx <- rdx
;; v23 <- IntConverter(uint32->int64, v6)
0x7f7bb9e9ed3e 8bd2 movl rdx,rdx
;; v9 <- EqualityCompare(v23 T{_Smi} != v18) T{bool}
0x7f7bb9e9ed40 4883fa00 cmpq rdx,0
0x7f7bb9e9ed44 7509 jnz 0x00007f7bb9e9ed4f
0x7f7bb9e9ed46 498b86d8000000 movq rax,[thr+0xd8] false
0x7f7bb9e9ed4d eb07 jmp 0x00007f7bb9e9ed56
0x7f7bb9e9ed4f 498b86d0000000 movq rax,[thr+0xd0] true
;; ParallelMove rax <- rax
;; Return:22(v9)
0x7f7bb9e9ed56 c3 ret
What would it take to replace the movl
/andl
/movl
/cmpq
with testb rdx,0x40
?
I see there is code to generate bit-test patterns, but it is not happening here - RecognizeTestPattern
in il.cc
.
We should be able to remove redundant moves caused by IntCoverter
if we massage register allocator a bit, and permit it to coalesce input and output allocations for the converters that are no-ops.
We can also extend RecognizeTestPattern
to handle more than just smi operations.
I'm not sure why a Uint32 representation is chosen in examples like this. BinaryUint32Op is not as fleshed-out as BinaryInt64Op, and does not attempt to use immediate operands. If I grep the output of
precompiler2 --disassemble
for dart2js, there are 2169 occurrences ofBinaryUint32Op(&
vs 300BinaryInt64Op(&
, so quite a lot of masking instructions are using unnecessary temporaries.๐ The above two constants are used once each and could be immediate operands
๐ here
๐ and here
๐ This seems unnecessary as
andl rsi, rcx
clears the high part