CensoredUsername / dynasm-rs

A dynasm-like tool for rust.
https://censoredusername.github.io/dynasm-rs/language/index.html
Mozilla Public License 2.0
705 stars 52 forks source link

Partial `mprotect` on executable memory #51

Closed losfair closed 3 years ago

losfair commented 3 years ago

I'm trying to alter a small part (~60 bytes) of code in a ~10MB Assembler. It is pretty slow on AArch64, taking around 200 microseconds.

By digging into the implementation a bit I found that the underlying make_exec method always calls mprotect on the entire executable memory. I haven't looked into the kernel mprotect implementation, but I assume it will invalidate instruction caches & TLBs for the entire region too since this is required to enforce memory protections.

Is it possible to track modifications dynamically inside alter, and make mprotect changes on-demand?

CensoredUsername commented 3 years ago

It stands to reason that it is possible, but it'd require a complete redo of the assembler architecture to support it as this is handled in a dependency : | . Probably would need to keep a shadow copy of the map around in RW memory, track any changes made and then only edit the affected pages to get some semblance of efficiency.

But well, I can tell you mprotect doesn't bother flushing the I$, otherwise #50 wouldn't be a thing. That was my assumption at first as well but it was proven wrong. Then the only logical thing it could be doing is just sequentially editing the TLB entries of the entire buffer. 200us is still a lot for that though, 10MB should be only two thousand and something pages, and ARM's loose ordering shenanigans should allow it to just queue up all the edits and only then throw in a barrier so why is it taking .1 microsecond per page.

losfair commented 3 years ago

Thanks for the answer!

Yeah 0.1us per page is a lot and I'm not sure why either. Maybe it depends on the microarchitecture of the AArch64 implementation?

My use case is a RV64 -> AArch64 dynamic binary translator, and I'm patching the code to edit inline caches. I guess I should just put the caches outside the executable region for now :)