Building assembler code in reverse

ptersilie commented 4 years ago

For our experimental JIT compiler, which @vext01 mentioned in #47, we want to compile the instructions of a trace backwards as that seems to be what other JITs do. One of the reasons being that it simplifies register allocation (see [1]).

Is that something that would be possible using dynasm-rs? I've seen that there's a function alter which allows to patch the buffer, but I don't think that allows us to reverse the compiled instructions. One way I can imagine doing this would be to cache the result of each dynasm! macro call. Then, when we are done, we simply need to reverse the order of the cached results, before making them executable. I assume that this is not currently possible, but was wondering if there another way that dynasm-rs could be used to achieve this? If not, would you be happy to accept a change that makes this possible?

[1] https://dl.acm.org/doi/pdf/10.1145/3132190.3132209

CensoredUsername commented 4 years ago

Hey there. That definitely sounds like an interesting use case. I've thought about it for a bit and with some changes it should be possible to support this mode of assembling as well. One major limitation would be that an assembler could easily support either forwards or backwards assembly, but not both.

The problem can be split in two parts I think. First is the reversal of the emitted instruction bytestream. This should be fairly easy as internally the plugin compiles single dynasm invocations to a list of calls to be made to the runtime. By reversing this list of calls at the end the instructions in a single call will be emitted tail to head. A minor complication exists here around emitted relocations, as they might now be emitted before the actual bytes that they modify are emittted. This should be avoided, and either the plugin or runtime needs to correct for them being emitted at a different offset in the instruction. Generalizing the handling of these will be beneficial.

The second part would involve an assembler that just works in reverse, emitting starting at the end of a buffer and working backwards. That should be pretty simple to implement. A bit of care will be necessary as all offsets should now be respective to the end of the buffer.

I think this will definitely be a useful feature to support. I'll have to look at the code myself for a bit to draft up a good design for the generalisation of relocations that will be necessary for the fixup to happen at the last stage of assembly, but outside that I don't see any immediate blockers.

ptersilie commented 4 years ago

If I understand you correctly, then you are suggesting to reverse both the order of the assembler instructions as well as the bytecode they emit. For example, if I have

dynasm!{
    mov rax, 0x1
    mov dl, 0x2
}

this currently (in normal mode) becomes

ops.extend(b"H\xb8");
ops.push_i64(0x1);
ops.extend(b"1\xd2\xb2");
ops.push_i8(0x2);

But in reverse mode this would instead turn into

ops.push_i8(0x2);
ops.extend(b"\xb2\xd21");
ops.push_i64(0x1);
ops.extend(b"\xb8H");

So not only is the order of the pushes/extends to ops reversed, but also their contents (i.e. the bytecode). This should then allow us to simply reverse the [u8] buffer at the end to get our executable code.

This is actually a neat idea, and not something I had thought of. But it does look quite involved (from an implementation perspective). What I initially had in mind was to just store the result of each dynasm! in a separate [u8]. For example, if I have

dynasm!{
    mov rax, 0x1
}

dynasm!{
    mov dl, 0x2
}

This would become something like

// First dynasm!
ops.new_vec();  // Create a new subvector
ops.extend(b"H\xb8"); // extend the new subvector
ops.push_i64(0x1);

// Second dynasm!
ops.new_vec();
ops.extend(b"1\xd2\xb2");
ops.push_i8(0x2);

So at the end you have a Vec<Vec<u8>> and only need to reverse the outer Vec and then merge the subvectors to get the executable code. This is possibly less performant than your solution, but looks a bit easier to implement. But there may be other problems that I missed. Thoughts?

CensoredUsername commented 4 years ago

I don't think it would be necessary to reverse the order of the bytecode, just reversing the order of the calls would be enough. In the end the bytes should still arrive in the executable buffer in the same order, its just filled in chunks from the end to the start.

However, that is not the main issue. If it was just that the solution would be simple. The main thing throwing a wrench of the works is the handling of relocations. Jumps, rip-relative offsets etcetera. These need to be recorded and are often only resolved and fixed up at a later point, the point at which the actual label location is defined. Furthermore, to prevent unnecessary fixups later on it is not a good idea to emit invocations to separate buffers and reorder. Ideally you want to emit to a single buffer and grow it or the direction of emission.

If you do not need relocation support you can currently already just emit to a VecAssembler per call and reorder it yourself but that will basically mean no control flow between invocations.

vext01 commented 4 years ago

Without going too deep into the details: we are compiling traces (straight-line paths through our own IR). However, our code still has control flow, as we need to jump to cleanup code when certain assumptions don't hold true (when a guard fails).

This cleanup code is going to need to be tailored to each trace, and thus cannot be compiled ahead of time. I think that means that we will need to compile the cleanup code at runtime and have labels for the guard failure code.

I'm guessing that means we can't use the VecAssembler?

CensoredUsername commented 1 year ago

I'm not sure, sorry.

chc4 commented 1 year ago

I noticed that there is a dynasm_backwards macro that assembles the code in reverse now. It says it isn't supported by the runtime currently - would adding runtime support be something like adding a BackwardsAssembler that takes an end address instead of a start address like VecAssembler, and fixup relocations at each commit using end-current?

vext01 commented 3 months ago

Hi again,

I've been reviewing this thread because we will soon need to write a register allocator for our tracing JIT and this would be simplified if we could visit the IR instructions of an IR trace in reverse order.

To be clear about our requirements, we would still want to emit the individual asm instructions for any given IR instruction in forwards order, but we'd want to process the list of IR instructions to codegen in reverse order.

A potentially simpler solution could be to make the backing vector for an Assembler a deque, and then add a Assembler::prepend_uncommitted() which returns a new kind of prepending Modifier. When you assemble into this new modifier, maybe it could it emit a call to a function that adds an offset (the size of the prepended code) to all relocations past the current assembly offset.

Would that work?

CensoredUsername / dynasm-rs

Building assembler code in reverse #48