Closed m4b closed 7 years ago
Something like that would be most definitely possible, I just didn't particularly bother with it as the transpiling stage is generally very fast compared to the actual execution.
The biggest complication lies in that the jit needs the ability to call back into rust functions. The way this is currently implemented is that the address of the function is encoded directly into the instruction stream using a mov rax, QWORD func_addr
instruction. This could of course be solved with a relocation section that keeps track of where all these calls are and patches the correct addresses in. This does however mean that a significantly sized relocation section will be needed.
Another way to approach this would be to instead of directly encoding the address in the instruction stream, store the addresses of all functions at the start of the instruction stream, and then call the function using RIP-relative addressing like call [->func_addr]
While this does cause a layer of indireciton, it is significantly faster to load, only requires a few bytes of overhead compared to a giant reloc section, and will actually make the executable smaller. Besides, the address would be extremely easy to speculate so the overhead should be low. It's more equivalent to how dll loading is done on x64.
Of course a data section with the serialized whitespace ops and op -> basic block mapping is still necessary for fallback reasons.
(For some reason your description of the text and data section are not showing so I just wrote something general on it).
Edit: at this point it'd also be an interesting idea to then create a small library with only the necessary runtime, concatenate this data to it with a small header at the end and create a sort-of standalone executable.
I believe github optimized away the sections named " " and " ", respectively :grin:
Re runtime, maybe you could compile a whitespace runtime library in a native format, and have any whitespace binary link against it, which calls a __libc_init_main
esque function which takes an offset into the binaries text/whitespace section as the start of the instructions to execute.
But this is all theoretical of course. You'd need an ELF writer (assuming starting with that). I've been meaning to start work on one but been too busy. Also not sure what the API would look like there, but that's most likely something that is best fleshed out while developing in tandem with an actual client of the API
I've made a proof of concept of this, although I took a slightly different approach which allowed me to avoid executable shenanigans.
Basically I added functionality to serialize the compiled datastructure, bytecode and offset array. This data is then concatenated to the engine executable with a small footer.
The runtime binary then opens itself, deserializes the data, and runs it.
While it's not truly native (it still has to mmap the memory as executable, and still requires the bytecode due to interpreter fallback), it does produce a standalone binary. Unfortunately the jit really needs the interpreter to function properly so I couldn't cut it out completely (although if any errors were to be ignored and wrapping arithmetic was used a completely native binary could be possible.
Uploaded the proof of concept here. It's very preliminary / rough though.
I can't seem to get this to run, though I don't know what the output is/should be or if i'm even running it correctly.
As I said, it's fairly preliminary and I didn't get to writing a proper build script yet. Also it's currently hardcoded for Windows. Build.sh should mostly explain the process (and you probably need to change the include_bytes! Invocation in main.rs to include the proper file instead of runtime.exe.
Anyway, after getting it all to build you can run 'cargo run --bin wsc -- program.ws --serialize program.exe` to generate a standalone executable.
I've done some extra experimentation, and it is possible to create a fully native binary. It however comes with several limitations:
This code is still safe though. Unfortunately full correctness / error reporting comes at a significant price in complexity.
Closing this for now as unfortunately it comes at quite a complexity cost while the jit startup cost is practically negligible (I've yet to see a program that takes more than 5 ms from the start of the interpreter to actually executing the program).
@m4b FYI, my Nebula compiler (written in Go) statically compiles Whitespace source to a binary, leveraging LLVM IR. It lowers Whitespace's stack-oriented instructions to an intermediate language in SSA form (a use/definition-based format that enables further analysis), runs some optimization passes, then emits LLVM IR which is handed to LLVM for native codegen.
I'm wondering if it's possible to assemble whitespace instructions and output a native binary?
Could we then feed this binary to the whitespace interpreter (outfitted with a suitable loader) and execute the whitespace instructions?
I would propose two new sections for the text and data:
And
Respectively