FEX-Emu / FEX

A fast usermode x86 and x86-64 emulator for Arm64 Linux
https://fex-emu.com
MIT License
2.08k stars 115 forks source link

SMC/Mtrack: Add mirrored memory support #1639

Open skmp opened 2 years ago

skmp commented 2 years ago

Can be both in process and out of process, and both for shm and files

In process test case: https://github.com/FEX-Emu/fex-assorted-tests-bins/blob/main/src/smc-shared.cpp Out of process test case: https://github.com/FEX-Emu/fex-assorted-tests-bins/blob/main/src/smc-shared-2.cpp

skmp commented 2 years ago

Spent a considerable time today investigating the implications of this.

So far

Shared mappings can be created by

These work across unrelated processes as well, assuming same file (mmap) or shmid (shmat)

Handling single-process

mmap

Safe bet is to track files via the fd's st_dev/st_inode. Those uniquely identify the file across processes as well, and are guaranteed not to be re-used as long as a file is open or has a reference. For MAP_ANON | MAP_SHARED, we can use a special dev #.

shmat

Looks like shmids are system-wide (per namespace?), and that the same key_t always maps to the same shmid, if one exists, except IPC_PRIVATE (key_t 0) always returns a new shmid.

notes

dev_t is a u32, st_inode is ino_t -> __kernel_ulong_t (u64 for amd64/aarch64) shmid is int (s32 for amd64/aarch64)

This could be combined in a 96/128-bit mem group id.

For every mapped range, we need to keep track the mem group it belongs to, and its offset, both to traverse it on invalidation, and for mremap to look it up.

For every mem group we need to keep a list of all the mappings and the mapping offsets.

When a shared mapping is done, the read-only pages need to be mirrored. When a page is compiled in a shared mapping, read only mirrors need to be propagated across all mirrors.

Handling cross process

As the mem group id would be system-wide, the invalidation would need to notify either all FEX processes with {mem group, offset, size} or, some daemon would have to keep track of which process use which mem groups, and notify the relevant processes. Synchronising the read only locks across processes will be tricky.

skmp commented 2 years ago

1558 now supports partial in process mirror tracking, using st_dev/st_inode, shmid, or an internally tracked id for MAP_ANON | MAP_SHARED

It doesn't propagate previous locks on new mmaps. Also, traversing mirrored mappings is not very efficient.

Cross process handling is still TBD.

skmp commented 2 years ago

Also, another idea from earlier research is to reverse the protection / tracking logic, and make read-only by default. This would work by making any PROT_EXEC not have PROT_WRITE, invalidate code and set PROT_WRITE on change, clear PROT_WRITE on MarkGuestAsExecutable.

This way, for the common case of non-modified executable code, the kernel can keep the mapping in a single, PROT_EXEC | PROT_READ ~ PROT_WRITE VMA, instead of fragmenting the VMA as blocks get compiled and remove the PROT_WRITE flag.

This wouldn't affect code blocks mapped with ~PROT_WRITE, which could be either way further optimised to not mprotect.

This would make it possible to map new regions of an mrid without needing to propagate locksm by mapping it as ~PROT_WRITE. This can further be restricted to MRIDs that contain code / or have been mapped as PROT_EXEC in the past to avoid overhead for data mappings.

This will still require mprotect for each MarkGuestAsExecutable call, which could be avoided by keeping a "modified PROT_EXEC VMAs".