Shopify / yjit

Optimizing JIT compiler built inside CRuby
650 stars 21 forks source link

Code pages refactoring #296

Closed maximecb closed 1 year ago

maximecb commented 2 years ago

Before we can proceed with the actual GC part of the code GC, we need to refactor our codegen so that code is allocated and generated into small regions (code pages)

TODO:

maximecb commented 2 years ago

@XrXr I've done some of the work on the code pages refactoring, including making sure that we store the current cb/ocb in the jitstate_t object and pass a cb to the codegen functions, so that we can eventually eliminate the global cb/ocb.

The idea is that each code page will have its own cb and ocb object: https://github.com/ruby/ruby/blob/9252830d7edb880b66902f3b075473dbc020f851/yjit_iface.c#L989

And for any pointer to a piece of machine code, you could get the code page, and so the associated cb: https://github.com/ruby/ruby/blob/9252830d7edb880b66902f3b075473dbc020f851/yjit_iface.c#L1080

This refactoring isn't complete though. Something we need to address is that the current mechanism with yjit_codepage_frozen_bytes seems incompatible with the idea of code pages. I think we need to migrate to either freezing entire code pages, or marking the code blocks as frozen, or marking individual blocks as being invalidated.

I was wondering if you have some thoughts on this, and whether you'd be up for tackling this? You definitely have the best in-depth knowledge of these parts of the codebase, so this is something you could work on while I'm away in December, but we can also pair on it when I'm back if you prefer, as you wish. Either way, I'd like your opinion on this 🙂

XrXr commented 2 years ago

We could give each code page their frozen_bytes. By the way, I think there is a problem with global invalidation and frozen_bytes and GC compaction. We don't patch after freezing, which might leave us with code that look like the following working with moved refernces:

....
call routine_that_triggers_global_invalidation_and_compaction
mov rax, heap_pointer
; work with rax 
; ---- YARV instruction boundary ---
; .. exit patched in by global invalidation

Not sure if we have instructions that follow this pattern, though, maybe defined.

Perhaps we should add a level of indirection so we don't need to patch code to support compaction. Whenever we need to work with heap pointers, we go through a pool. I think Dart does this, at least it used to. So the code could look like:

mov rax, [pool_reg + obj_idx]

It's slower but more robust.

maximecb commented 2 years ago

Ok so we could give each code page their frozen_bytes. That seems like it would work.

Why not just mark individual blocks as invalidated with a bit flag though? Could that work as well? It could have the benefit that we can mark a block as invalidated without freezing the code page (patching still possible?) or just have a more granular concept of invalidation. That makes intuitive sense to me, but maybe you've thought of a reason why that wouldn't work.

we go through a pool. I think Dart does this, at least it used to. So the code could look like [... ] It's slower but more robust.

AFAIK most JIT compilers do this, and I don't think it's slower. A common reason for doing it is that it gives you a more compact encoding than mov rax, 64_bit_value; mov dst, rax. I'm not opposed to the idea, though the pool of references becomes one more resource that we have to manage. The fact that so many JITs do this though, suggests it may actually be the way to go.

XrXr commented 2 years ago

Why not just mark individual blocks as invalidated with a bit flag though?

I think the problem is sometimes we do patching for branches and with blocks and branches changing sizes the mapping between code address and blocks is a bit hairy. The frozen bytes number made more sense to me as a direct path to preventing modifications.

maximecb commented 2 years ago

I think that having a per-page frozen bytes doesn't seem to accurately map what we want, because most likely, we're going to have a current page that we generate code into, and more or less sequentially fill up the page and allocate a new one (with some occasional patching of older pages).

The branches belong to blocks and they have a pointer to their parent block. We could also have a bit flag on the branch_t objects themselves for frozen/invalidated, if that's what we really care about?

The reason you implemented the frozen_bytes logic in the first place is to make sure that paused ractors don't end up patching jumps in blocks that have been invalidated, is that correct? If so, IMO that seems to map pretty well to the concept of having a bit flag for "this block has been invalidated", and it would also make the core easier to read (otherwise it's like, what is frozen bytes anyway, why is it there?).

maximecb commented 1 year ago

Completed by Kokubun & Alan