Roma gVisor + KV Lookups: How ~Will~ Might Local KV Lookups Happen from an IPC Perspective, Possible Usage of Mapped Memory

I was hoping to have something better written up by now, and will still try to find time, but don't want to make perfect the enemy of the good, so I'm putting this in to see if we can have another gVisor conversation tomorrow (8/14/24 BA meeting), this time w/r/t what it might open up for KV lookups.

I'd like to dive in more on how a hook like getValue, that is reading data that the ad tech loaded via the data loader, ~will~ might work in a Roma gVisor world. From our conversation last time, I believe we're still looking at a similar mode of IPC as we were in a WASM setup, which is one of the performance challenges we've seen.

So, first, hoping we can dive more into what IPC mechanism(s) is(are) currently being considered for an ~eventual~ hopeful implementation.

Second, I'd also like to discuss if the closer-to-native computing model gVisor gives us might allow us to do some high utility things, given we can potentially now leverage Linux APIs for resource management, isolation, and protection more directly. For instance, I'd like to get thoughts on using memory maps and protections to allow the ad tech to have it's own writer process, which can allow a restricted reader to read directly from memory, which for KV would go something like this:

KV TEE accepts two binaries from ad tech that are compiled to be dynamically loaded; the writer exposes a loadData function and the reader still exposes handleRequest.
The main process, the attested KV, starts up in the enclave and does the following: 2a. mmaps a region of memory, with the MAP_SHARED and MAP_ANONYMOUS flags. The size can be configureable by the ad tech, and the supervising code will share this region between itself, the writer, and the readers. 2b. Creates the writer process via clone, in particular it does not use VM_CLONE in the flags, and it mprotect’s the shared region in that to read/write (PROT_READ | PROT_WRITE). It then loads the ad tech writer shared object, and hands off the shared_memory for writing in the hook call. ( 2c. Creates the read processes via clone (number can be configureable), again different virtual memory space (no VM_CLONE), but here mprotects the shared region to read only (PROT_READ). We allow the read process to use the system calls underneath malloc in it's own space for it's own operations, but if it tries to write to the shared_region to persist values between requests it will crash, so it's strongly incented not to do that.
The KV code does it's normal request handling and invokes handleRequest as it does today, using one of the readers. To stay completely side effect free we could "just" spawn new readers for each request...obvs this has it's own issues, maybe could do some clever things but let's start here.

I was playing around with a toy version of this process setup, and think I verified the following things:

The shared memory region does ultimately share the same physical memory address between processes.
Despite having the same virtual memory addresses, the processes do indeed have different physical addresses for their heap.
The reader does die if it tries to write to the shared memory region.

This could be extended with different namespaces, precise usage of seccomp for the writer vs the reader, etc...but given I'm not an expert at this kind of stuff (intermediate? blue square) I wanted to see what kind of reception the basic idea gets.

WICG / protected-auction-services-discussion

Roma gVisor + KV Lookups: How ~Will~ Might Local KV Lookups Happen from an IPC Perspective, Possible Usage of Mapped Memory #83