Open thegreatfatzby opened 2 months ago
@itaysharfi hoping to ping Peter or Brian but I can't detect their handles, hoping you can "forward".
"@"peter/"@"brian, I think you mentioned that for now the mmap'd region will be file backed...I guess in the current setup it doesn't matter too much, but would you see any issues with an anonymous map being used in the future.
I was hoping to have something better written up by now, and will still try to find time, but don't want to make perfect the enemy of the good, so I'm putting this in to see if we can have another gVisor conversation tomorrow (8/14/24 BA meeting), this time w/r/t what it might open up for KV lookups.
I'd like to dive in more on how a hook like
getValue
, that is reading data that the ad tech loaded via the data loader, ~will~ might work in a Roma gVisor world. From our conversation last time, I believe we're still looking at a similar mode of IPC as we were in a WASM setup, which is one of the performance challenges we've seen.So, first, hoping we can dive more into what IPC mechanism(s) is(are) currently being considered for an ~eventual~ hopeful implementation.
Second, I'd also like to discuss if the closer-to-native computing model gVisor gives us might allow us to do some high utility things, given we can potentially now leverage Linux APIs for resource management, isolation, and protection more directly. For instance, I'd like to get thoughts on using memory maps and protections to allow the ad tech to have it's own writer process, which can allow a restricted reader to read directly from memory, which for KV would go something like this:
loadData
function and the reader still exposeshandleRequest
.mmap
s a region of memory, with theMAP_SHARED
andMAP_ANONYMOUS
flags. The size can be configureable by the ad tech, and the supervising code will share this region between itself, the writer, and the readers. 2b. Creates the writer process viaclone
, in particular it does not useVM_CLONE
in the flags, and itmprotect
’s the shared region in that to read/write (PROT_READ | PROT_WRITE
). It then loads the ad tech writer shared object, and hands off the shared_memory for writing in the hook call. ( 2c. Creates the read processes viaclone
(number can be configureable), again different virtual memory space (noVM_CLONE
), but heremprotects
the shared region to read only (PROT_READ
). We allow the read process to use the system calls underneathmalloc
in it's own space for it's own operations, but if it tries to write to theshared_region
to persist values between requests it will crash, so it's strongly incented not to do that.handleRequest
as it does today, using one of the readers. To stay completely side effect free we could "just" spawn new readers for each request...obvs this has it's own issues, maybe could do some clever things but let's start here.I was playing around with a toy version of this process setup, and think I verified the following things:
This could be extended with different namespaces, precise usage of seccomp for the writer vs the reader, etc...but given I'm not an expert at this kind of stuff (intermediate? blue square) I wanted to see what kind of reception the basic idea gets.