dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.42k stars 199 forks source link

[NativeAOT-LLVM] Codegen deficiencies in object allocator helpers #2353

Open SingleAccretion opened 1 year ago

SingleAccretion commented 1 year ago

I happened to look at the WASM they produce, and it's not ideal:

04d4a7 func[808] <RhpNewFast>:
 04d4a8: 03 7f                      | local[0..2] type=i32
 04d4aa: 02 40                      | block
 04d4ac: 41 e4 e9 c5 80 00          |   i32.const 1144036
 04d4b2: 10 d6 85 80 80 00          |   call 726 <Thread::GetAllocContext()>
 04d4b8: 22 02                      |   local.tee 2
 04d4ba: 28 02 04                   |   i32.load 2 4
 04d4bd: 20 02                      |   local.get 2
 04d4bf: 28 02 00                   |   i32.load 2 0
 04d4c2: 22 03                      |   local.tee 3
 04d4c4: 6b                         |   i32.sub
 04d4c5: 20 01                      |   local.get 1
 04d4c7: 28 02 04                   |   i32.load 2 4
 04d4ca: 22 04                      |   local.tee 4
 04d4cc: 49                         |   i32.lt_u
 04d4cd: 0d 00                      |   br_if 0
 04d4cf: 20 02                      |   local.get 2
 04d4d1: 20 03                      |   local.get 3
 04d4d3: 20 04                      |   local.get 4
 04d4d5: 6a                         |   i32.add
 04d4d6: 36 02 00                   |   i32.store 2 0
 04d4d9: 20 03                      |   local.get 3
 04d4db: 20 01                      |   local.get 1
 04d4dd: 36 02 00                   |   i32.store 2 0
 04d4e0: 20 03                      |   local.get 3
 04d4e2: 0f                         |   return
 04d4e3: 0b                         | end
 04d4e4: 20 00                      | local.get 0
 04d4e6: 10 a0 81 80 80 00          | call 160 <RhpSetShadowStackTop>
 04d4ec: 02 40                      | block
 04d4ee: 20 01                      |   local.get 1
 04d4f0: 41 00                      |   i32.const 0
 04d4f2: 41 00                      |   i32.const 0
 04d4f4: 41 00                      |   i32.const 0
 04d4f6: 10 d7 85 80 80 00          |   call 727 <RhpGcAlloc>
 04d4fc: 22 02                      |   local.tee 2
 04d4fe: 0d 00                      |   br_if 0
 04d500: 20 00                      |   local.get 0
 04d502: 20 01                      |   local.get 1
 04d504: 41 00                      |   i32.const 0
 04d506: 10 ae b3 80 80 00          |   call 6574 <S_P_CoreLib_System_Runtime_EH__FailedAllocation>
 04d50c: 41 00                      |   i32.const 0
 04d50e: 21 02                      |   local.set 2
 04d510: 0b                         | end
 04d511: 20 02                      | local.get 2
 04d513: 0b                         | end

1) Thread::GetAllocContext is not inlined. Could be fixable if we link against the runtime built with LTO, or simply by moving the definition to some header. 2) AllocateObject is inlined, which is unnecessary. Could probably save some bytes by marking it noinline.