dibyendumajumdar / nanojit

NanoJIT is a small, cross-platform C++ library that emits machine code.
Mozilla Public License 2.0
151 stars 16 forks source link

Stack push/pop #16

Open remi6397 opened 1 year ago

remi6397 commented 1 year ago

Kind of a rookie question, but I have a use case that makes nanojit seem like a perfect fit, but am struggling a bit understanding how does one implement a stack in this, one that could be push'd to and pop'd from. This is a pretty easy task in say sljit (see digitalgust's minijvm), but here, I don't know where to begin. One idea I have right now is to alloca a variable for the SP and then just alloca objects as I go? Then, if I want to pop, I just decrement the SP and store and conversely, if I want to push, then I store and increment the SP, right? I assume there won't be gaps and that the stack grows in the normal direction?

I'd very much appreciate if you pointed me in the right direction. :)

dibyendumajumdar commented 1 year ago

hi, in Nanojit you don't explicitly push/pop to the stack.

playXE commented 1 year ago

If you're compiling stack bytecode to some kind of IR you need stack to SSA conversion. I have an example of it here: https://github.com/playXE/stack2ssa/blob/main/src/lower.rs (using Cranelift, not NanoJIT). It is based on code from JikesRVM here :https://github.com/JikesRVM/JikesRVM/blob/master/rvm/src/org/jikesrvm/compilers/opt/bc2ir/BC2IR.java

remi6397 commented 1 year ago

hi, in Nanojit you don't explicitly push/pop to the stack.

Ah, okay, so it's like a space for function-local variables, right? In that case, should I do exactly what digitalgust does, i.e. allocate and manage a custom stack structure on the heap?


If you're compiling stack bytecode to some kind of IR you need stack to SSA conversion. I have an example of it here: https://github.com/playXE/stack2ssa/blob/main/src/lower.rs (using Cranelift, not NanoJIT). It is based on code from JikesRVM here :https://github.com/JikesRVM/JikesRVM/blob/master/rvm/src/org/jikesrvm/compilers/opt/bc2ir/BC2IR.java

Yes indeed, a stack machine is exactly what I'm trying to JIT here. Thank you for pointing that out!

So basically I do something like that, right?:

typedef struct {
  gm_context pass;
  NJXContextRef ctx;
  NJXFunctionBuilderRef fb;
  size_t operand_sp_off;
  size_t operand_sp_max;
  NJXLInsRef operand_stack_base;
} gm_context_impl;

// Push(Pop() + Pop())
static inline void GM_emit_Add(gm_context_impl *gm) {
  NJXLInsRef lhs = NJX_load_i(gm->fb, gm->operand_stack_base, (--gm->operand_sp_off) * sizeof(gm_stack_obj) + offsetof(gm_stack_obj, int32_v));
  NJXLInsRef rhs = NJX_load_i(gm->fb, gm->operand_stack_base, (--gm->operand_sp_off) * sizeof(gm_stack_obj) + offsetof(gm_stack_obj, int32_v));

  NJX_store_i(gm->fb, NJX_addi(gm->fb, lhs, rhs), gm->operand_stack_base, (gm->operand_sp_off++) * sizeof(gm_stack_obj) + offsetof(gm_stack_obj, int32_v));
}

Or could that fail due to alignment or otherwise? Perhaps I should have my own "stack" of NJXLInsRef's instead of relying on contiguity of the built-in one?


And also thank y'all for responding so soon. :)

dibyendumajumdar commented 1 year ago

hi, btw I recommend using https://github.com/vnmakarov/mir over Nanojit. It is generates much better code.

Re stack slots in Nanojit - my approach was to alloc all stack slots at the beginnning of a function, put live instructions at the end. And inside the function, store/load from it. But I haven't coded in this for some time so speaking from memory.

You can see examples at https://github.com/dibyendumajumdar/dmr_c/tree/master/tests/nano

remi6397 commented 1 year ago

I had already heard about MIR, it's great and I would've used it if only it supported 32-bit platforms. Though I'm still only prototyping/researching and haven't done any benchmarks yet, there's no way code generated by nanojit would be slower than completely unoptimized code made using (still awesome) sljit. It may not be perfect, but so far it seems like the best option.

dibyendumajumdar commented 1 year ago

Originally NanoJIT was designed and used as a trace compiler, so for linear code without branching its probably fine, as that was the original use case. Unfortunately the best way to use it was never documented by the original authors. At some point I tried checking how it was used here https://github.com/adobe/avmplus/blob/master/core/CodegenLIR.cpp but I didn't get very far.

playXE commented 1 year ago

Yes indeed, a stack machine is exactly what I'm trying to JIT here. Thank you for pointing that out!

So basically I do something like that, right?:

Not really. You should do smth like this instead:

struct GmContext {
    operand_stack: Vec<NJXLInsRef>
}

fn emit_add(cx: &mut GmContext) {
    let rhs = cx.operand_stack.pop();
    let lhs = cx.operand_stack.pop();
    let res = NJX_addi(..., lhs, rhs);
    cx.operand_stack.push(res);
}

Special care must be taken in case of branches and loops. I recommend looking at original BC2IR.java or to my port of it. You can see that in original code and in my version of it there is no any memory loads unless explicitly required (LGet,LSet opcodes in my version for local variables) P.S: Obviously the way you did it can work but it would not be effective as you still do lots of loads and stores to memory. The way to do it correctly is to actually lower all stack operations to SSA IR which means most of operations would be performed in registers without doing lots of loads or stores

UPD: You can also join r/ProgrammingLanguages Discord server to get help there, I am also there, you can @ me if you need more help. :)