Question: dynamically replacing code while running it

I'm trying to use dynasm-rs for an experimental JIT compiler project and was wondering if you could help me with a couple of open questions I have. My goal is to compile functions the first time they are called, i.e. initially generate machine code to call the compiler, then have the compiler replace that code with the actual implementation, then jumping back to that implementation.

My first observation was that I (understandably) need a trampoline, because alter can't extend the buffer. So the actual implementation must be placed in a separate buffer that is large enough. If my impression that this is not easily prevented is wrong, any pointers would be nice but I can live with that for my purposes.

The step that I'm missing is having the compiler be the thing that is called by the trampoline before the actual implementation has been compiled. Right now, I've got all these steps working (source):

generate a trampoline code sequence that calls some stub code (i.e. not the compiler)
a call at this point successfully executes the stub
call the compiler manually
a call at this point successfully executes the actual code (it's the one from the tutorial)

Right now, because of how I implemented DynReplacement::call()/Uncompiled::call(), I'm holding a reference to those structs for the whole call's duration. However, I'd need mutable access to DynReplacement/take ownership of Uncompiled to do the compilation process as implemented.

I could create a getter for func so that I can call the function while not holding a reference - but the docs warn that "if this buffer is accessed through an Executor, these pointers will only be valid as long as its lock is held", which I would violate by doing this.

I think the key here is that to call the trampoline, I need to hold the lock/have a reference, but as soon as the trampoline has jumped to the compiler function, I can get rid of that lock - however I'm not sure where to go from there, and maybe you have faced this/thought about something like this before.

Thanks in advance and best regards!

Hmm, that does sound quite complicated. That lock is there for a good reason. When it's not locked, any assembler operations are free to move the buffer around in memory. If enough code is added to the assembled buffer it might have to reallocate the buffer to resize it. If that were to happen while a reference to the old buffer (like a return address on the call stack) is still alive, the moment you try to return into the assembled code you'd probably just immediately segfault as the code is simply no longer at that address. You cannot conserve pointers into the assembled code (and that includes return addresses on the call stack) when the assembler is active, unless you would have allocated a fixed-size buffer in advance.

Evidently allocating a fixed-size buffer in advance is then one possible solution. Drawbacks are of course that it can be either too small or have significant overhead in memory.

Another solution that's much more technically fun involves some technically complex things that are definitely rather unsafe. That lock has a dual function: it prevents you from doing really dumb stuff like calling the assembler in one thread while code is executing in the other thread, and it prevents you from recursively calling the compiler while in assembled code. Both these things can still be safe though, provided you take the appropriate precautions.

To make calling into the assembler safe inside of assembled code, you need to ensure that any return addresses on the callstack are corrected if the assembler buffer is moved in memory before returning into the assembled code. This requires your emitted code to emit some basic stack frame information next to the return addresses. Then, ensure you also correct the return address of the rust function that was called from the assembled code.

If you want to be absolutely correct w.r.t. possible multithreaded assembler use, then you can acquire the lock before calling into the assembled code. When the assembled code calls you again, first store the current address of the assembled buffer, then release the lock. Do your assembling, and reacquire the lock. Compare the possibly changed address of the assembled buffer, and if it's different, do the stack walk to correct the return addresses. Then you can safely return back into it.

I hope this helps you somewhat, best regards!

Thank you for your response! Just to make sure I don't misunderstand you: in the end I will be calling an extern "win64" fn that points to the following code:

; .arch x64
; mov rax, QWORD func as _  // where func is the compiler initially, then later the final code
; jmp rax

I think we can even assume that func will simply be one of two plain Rust functions. When you say

You cannot conserve pointers into the assembled code (and that includes return addresses on the call stack) when the assembler is active, unless you would have allocated a fixed-size buffer in advance.

That lock has a dual function: it prevents you from doing really dumb stuff like calling the assembler in one thread while code is executing in the other thread, and it prevents you from recursively calling the compiler while in assembled code.

am I correct in saying

since there's no call here, I don't have to worry about return addresses
when rewriting that code snippet (changing func), the number of bytes necessary is the same, so the buffer shouldn't move?
once the jmp has executed, I'm no longer "in assembled code"
so the only thing left to keep in mind when thinking about this lock is that other threads could be executing the code at the same time.

Does that sound plausible?

I will think more about this, thanks for your help!

since there's no call here, I don't have to worry about return addresses

Even if you're tail-calling into a function (that's what the trampoline is doing effectively, except for not conserving the rax register) that function will still return at some point no? (I presume the location of what called the trampoline).

when rewriting that code snippet (changing func), the number of bytes necessary is the same, so the buffer shouldn't move?

Correct. Altering the buffer should conserve addresses. However, adding extra assembled code at the end (to fit a new JIT-compiled function) would not.

once the jmp has executed, I'm no longer "in assembled code"

Yep

so the only thing left to keep in mind when thinking about this lock is that other threads could be executing the code at the same time.

I think so based on my preliminary understanding of your source code. I assumed in my first reply that you'd assemble the functions and the trampolines into the same assembler as that's generally more efficient (otherwise you end up claiming a 4K page for each individiual function) but looks to me like you're creating a new assembler for each compiled function, so the address of each actual function is stable.

So for now I'd say you're probably safe, but if you ever want to move from the somewhat inefficient double indirect calls to direct calls to compiled functions you'll likely have to do some rearchitecting.

that function will still return at some point no? (I presume the location of what called the trampoline)

yes, and I think in my vision I can depend on the caller to not have moved - at least as long as the double indirections stay...

you end up claiming a 4K page for each individiual function

ah, I had not considered that. That makes this approach even less efficient than I thought.

So for now I'd say you're probably safe, but if you ever want to move from the somewhat inefficient double indirect calls to direct calls to compiled functions you'll likely have to do some rearchitecting.

Sounds like it. I think that answers everything I wanted to learn about, thank you for your time!

CensoredUsername / dynasm-rs

Question: dynamically replacing code while running it #77