dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.66k forks source link

[NativeAOT][Proposal] GC stress support for NativeAOT #107850

Open VSadov opened 6 days ago

VSadov commented 6 days ago

We need a GC-stress story for Native AOT. We rely on JIT-based GC stress in CoreCLR, but once in a while we have a stress bug specific to NativeAOT, that could be found a lot earlier if we had GC stress infrastructure that could target NativeAOT directly.

There are some remnants of GC stress support in the NativeAOT codebase. The code appears to be old - since before switching to RuyJIT and targets different style of suspension (i.e. completely synchronous with polling and loop hijacking). For the current design that code is not very useful and could be mostly removed.

Experiments with simpler stress approaches like an extra thread blasting GC.Collect() in a loop did not yield convincing results. Varying the rate of collections, however smart, makes it either not stressful enough, too expensive or both. It may be an interesting approach for a quick/adhoc stressing of some scenarios, but as a general-purpose GC stress mechanism it appears to be a dead-end. It is better than nothing, but that is not a very inspiring bar to clear.

An approach, similar to CoreCLR, while more complex conceptually, could be more promising. The idea of instrumenting safepoints with illegal instructions and then stress-and-fix them as faults are encountered will be more thorough while also less redundant as one location is tested roughly once.

Rough sketch of the idea:

There is a small caveat for ISAs with variable instruction size (i.e. x64). We may need to resort to partial disassembling so that given a safepoint location, we could figure the size of the safepoint instruction. That is also the case in CoreCLR implementation and same disasm support could be used.

dotnet-policy-service[bot] commented 6 days ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

SingleAccretion commented 6 days ago

Could this stress be driven by the Jit?

The idea would be to insert something like this in emit after all safe points (interruptible instructions).

If feasible, it would have the advantage of needing fewer modifications to the rest of the system (e. g. all the places that store native code pointers in runtime data structures).

filipnavara commented 6 days ago

Could this stress be driven by the Jit?

The idea would be to insert something like this in emit after all safe points (interruptible instructions).

That may be preferable to support targets like macOS, or even iOS. That said, I kinda like the idea of using hlt-like instruction, which could be inserted by JIT too. The reason is that it's actually closer to the signal processing done in actual GC. Also, any additional complex code emitted in JIT may break scenarios that we would like to test. For example, frameless methods [on ARM64] cannot make calls so they would not be testable by the approach in JIT_StressGC which makes the GC call directly.

VSadov commented 6 days ago

Could this stress be driven by the Jit?

The idea would be to insert something like this in emit after all safe points (interruptible instructions).

If feasible, it would have the advantage of needing fewer modifications to the rest of the system (e. g. all the places that store native code pointers in runtime data structures).

I think it is a very important advantage of HLT instrumentation that it does not require different emit. We would be testing exactly the same code with the same GC info shape. The HLT only ensures that every reachable location will be tested and generally only once.

JIT-inserted probes could work acceptably when the number of safe points is modest (i.e. call sites, loop back branches), but with fully interruptible code where every instruction is interruptible it could get awkward. Runtime will need to remember what locations were covered (a hash table I guess, which can get pretty large). That will prevent retesting, but the probes will still keep firing and check the table for roughly every instruction.

There are scenarios where JIT needs to insert NOP/BRK in the instruction stream to make sure the same safepoint cannot be reached with different GC info or be in different EH regions. Will adding a bunch of interleaving probe calls make this scenario easier or harder? Are the probes themselves interruptible? (they will have to be in fully interruptible code), so will we have to extend interruptible ranges for the probes at the ends? I'd rather not answer these questions.

Inserting probes at JIT time can be made to work too, but instrumentation after loading feels closer to testing the original code.

Even for MacOS, if there is a lot of desire, there could be a way. Somehow debugger can put breakpoints after all... These HLT are basically a bunch of single-use breakpoints.

SingleAccretion commented 6 days ago

I think it is a very important advantage of HLT instrumentation that it does not require different emit. We would be testing exactly the same code with the same GC info shape.

Agreed.

Runtime will need to remember what locations were covered (a hash table I guess, which can get pretty large). That will prevent retesting, but the probes will still keep firing and check the table for roughly every instruction.

One doesn't need a hash table for this - the Jit can emit an inline check for each probe, or pass the flag address to the helper:

cmp [location_probed_flag], 1 # One byte per each probe site, generated at compile time
je SKIP
call RhpStressGc

I also agree it is not clear this would be acceptably fast for the fully interruptible case. It is known that it is acceptably fast for the partially interruptible one (we have such a scheme implemented in NativeAOT-LLVM).

I suppose the code pointers problem is not that hard to solve - you 'just' need to record all of the static relocations (including for code itself, to adjust RIP-relative addresses) and re-apply them as appropriate at runtime when the code is copied.

VSadov commented 6 days ago

CC: @janvorli @mangod9