Open cfallin opened 2 years ago
cc @peterhuene, you probably know the reason for separate mmaps here.
One idea is that we could possibly have less fragmentation by putting everything in one since we don't technically need guards between the instance and the tables (in theory at least), although we do for sure needs guards on stacks and on memories.
The primary reason for the disjoint instance resources is that it greatly simplified the uffd
implementation as we could just monitor a single range for all linear memories rather than multiple discrete parts of a range representing an instance's linear memory pages (or, alternatively, monitor pages meant for tables or stacks).
Indeed, the cost was additional madvise
calls to make it simpler. Lucet has the design proposed here where all instance resources remain in a continuous slot (and thus a single madvise
call), but its uffd handler has to handle faults in pages not meant for linear memory too.
A single madvise
call is definitely preferable and if we're replacing the uffd implementation, I think this makes sense.
Using separate pools would also make it harder to leak pointers (or even cause code to interpret attacker controlled bytes as a pointer) in case of bugs I would guess.
A bit belated but a month or so I did some investigation into this to see what would happen. My findings at the time was that the cost for an madvise
call was proportional to the size of the memory that we're madvise
-ing, even if that memory wasn't present. My conclusion at the time was that it was better to do the little madvise
calls we do today rather than lumping everything into one giant allocation where one madvise
call is all that's necessary to blow away.
I wasn't super scientific in my findings, though. Additionally it was also awhile ago I collected data for this and I probably don't have the branch around any more. Wanted to note though that I don't think this is necessarily a slam dunk and before trying to proceed with this (if ever), we'll want to carefully evaluate performance.
After some discussion with @alexcrichton just now, the question of why wasmtime's pooling allocator maintains separate pools for memories, tables, and instance structs (and stacks, but those are special on Windows at least) arose.
Especially if we're considering using
madvise
to flash-clear state and make instantiation faster, it might make sense to investigate whether we can "transpose" the arrangement and put all of an instance's resources in one contiguous region (with appropriate guards of course).For example, one instance slot could contain all of that instance's tables and memories, and the
Instance
struct, and perhaps the stack on non-Windows platforms.The major advantage of this is that we can decommit (flash-clear) the whole region with one
madvise
syscall. In contrast, today, we need a separatemadvise
for each resource. This cost is somewhat hidden today because we write ~every byte of theInstance
and the tables (with eager init) but if we lazily initialize state in tables and/orInstance
s (e.g. lazy anyfunc init) this could matter more.