Open remi6397 opened 1 year ago
hi, in Nanojit you don't explicitly push/pop to the stack.
If you're compiling stack bytecode to some kind of IR you need stack to SSA conversion. I have an example of it here: https://github.com/playXE/stack2ssa/blob/main/src/lower.rs (using Cranelift, not NanoJIT). It is based on code from JikesRVM here :https://github.com/JikesRVM/JikesRVM/blob/master/rvm/src/org/jikesrvm/compilers/opt/bc2ir/BC2IR.java
hi, in Nanojit you don't explicitly push/pop to the stack.
Ah, okay, so it's like a space for function-local variables, right? In that case, should I do exactly what digitalgust does, i.e. allocate and manage a custom stack structure on the heap?
If you're compiling stack bytecode to some kind of IR you need stack to SSA conversion. I have an example of it here: https://github.com/playXE/stack2ssa/blob/main/src/lower.rs (using Cranelift, not NanoJIT). It is based on code from JikesRVM here :https://github.com/JikesRVM/JikesRVM/blob/master/rvm/src/org/jikesrvm/compilers/opt/bc2ir/BC2IR.java
Yes indeed, a stack machine is exactly what I'm trying to JIT here. Thank you for pointing that out!
So basically I do something like that, right?:
typedef struct {
gm_context pass;
NJXContextRef ctx;
NJXFunctionBuilderRef fb;
size_t operand_sp_off;
size_t operand_sp_max;
NJXLInsRef operand_stack_base;
} gm_context_impl;
// Push(Pop() + Pop())
static inline void GM_emit_Add(gm_context_impl *gm) {
NJXLInsRef lhs = NJX_load_i(gm->fb, gm->operand_stack_base, (--gm->operand_sp_off) * sizeof(gm_stack_obj) + offsetof(gm_stack_obj, int32_v));
NJXLInsRef rhs = NJX_load_i(gm->fb, gm->operand_stack_base, (--gm->operand_sp_off) * sizeof(gm_stack_obj) + offsetof(gm_stack_obj, int32_v));
NJX_store_i(gm->fb, NJX_addi(gm->fb, lhs, rhs), gm->operand_stack_base, (gm->operand_sp_off++) * sizeof(gm_stack_obj) + offsetof(gm_stack_obj, int32_v));
}
Or could that fail due to alignment or otherwise? Perhaps I should have my own "stack" of NJXLInsRef
's instead of relying on contiguity of the built-in one?
And also thank y'all for responding so soon. :)
hi, btw I recommend using https://github.com/vnmakarov/mir over Nanojit. It is generates much better code.
Re stack slots in Nanojit - my approach was to alloc all stack slots at the beginnning of a function, put live instructions at the end. And inside the function, store/load from it. But I haven't coded in this for some time so speaking from memory.
You can see examples at https://github.com/dibyendumajumdar/dmr_c/tree/master/tests/nano
I had already heard about MIR, it's great and I would've used it if only it supported 32-bit platforms. Though I'm still only prototyping/researching and haven't done any benchmarks yet, there's no way code generated by nanojit would be slower than completely unoptimized code made using (still awesome) sljit. It may not be perfect, but so far it seems like the best option.
Originally NanoJIT was designed and used as a trace compiler, so for linear code without branching its probably fine, as that was the original use case. Unfortunately the best way to use it was never documented by the original authors. At some point I tried checking how it was used here https://github.com/adobe/avmplus/blob/master/core/CodegenLIR.cpp but I didn't get very far.
Yes indeed, a stack machine is exactly what I'm trying to JIT here. Thank you for pointing that out!
So basically I do something like that, right?:
Not really. You should do smth like this instead:
struct GmContext {
operand_stack: Vec<NJXLInsRef>
}
fn emit_add(cx: &mut GmContext) {
let rhs = cx.operand_stack.pop();
let lhs = cx.operand_stack.pop();
let res = NJX_addi(..., lhs, rhs);
cx.operand_stack.push(res);
}
Special care must be taken in case of branches and loops. I recommend looking at original BC2IR.java or to my port of it. You can see that in original code and in my version of it there is no any memory loads unless explicitly required (LGet,LSet opcodes in my version for local variables) P.S: Obviously the way you did it can work but it would not be effective as you still do lots of loads and stores to memory. The way to do it correctly is to actually lower all stack operations to SSA IR which means most of operations would be performed in registers without doing lots of loads or stores
UPD: You can also join r/ProgrammingLanguages Discord server to get help there, I am also there, you can @ me if you need more help. :)
Kind of a rookie question, but I have a use case that makes nanojit seem like a perfect fit, but am struggling a bit understanding how does one implement a stack in this, one that could be
push
'd to andpop
'd from. This is a pretty easy task in say sljit (see digitalgust's minijvm), but here, I don't know where to begin. One idea I have right now is toalloca
a variable for the SP and then justalloca
objects as I go? Then, if I want topop
, I just decrement the SP and store and conversely, if I want topush
, then I store and increment the SP, right? I assume there won't be gaps and that the stack grows in the normal direction?I'd very much appreciate if you pointed me in the right direction. :)