Use Wayland instead of X11 to increase performance

artemist commented 6 years ago

Although this is not a security issue due to the guid security model, there are several advantages to using Wayland instead of X11:

Advantages

Higher performance

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

Lower memory usage

Since framebuffers are mapped instead of copied, the proxy wayland compositor should use less memory than xorg (On a VM which currently has 800M of RAM and two windows, Xorg is using 1/6th of the physical memory)

Easier GPU acceleration support

AFAIR, a lot of OpenGL operations are preformed within the X server through the X OpenGL extensions. Simply forwarding these commands to the guivm would be dangerous, so we would need to process within the Xorg server then send the displaylist sometime before the end of processing and rendering. With Wayland graphics processing happens within the context of the application, and only a framebuffer is shared to the compositor. This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

Multiple dpi support

Wayland allows one to attach multiple displays with different densities, which is important for people with HiDPI laptops who want to use external displays. We can simply forward events for screen update to the client, although we have to deal with anonymity for anon-whonix, where position of multiple displays could be very revealing.

Method

Wayland has two communication methods; Commands over a Unix socket, and shared memory buffers through a file descriptor with mmap. Commands, including shared memory setup and keyboard input, should be proxied through a client in the guivm and a stub compositor in the appvm. However, wl_shm::create_pool and wl_shm events should be intercepted so that the stub compositor and guivm wayland client both create file descriptors in their VMs, and the guivm maps a foreign range (or asks dom0 to do so, I'm not sure quite how that would work) to link together the contents of those two memory ranges.

Doing this

I am starting work on forwarding Wayland between VMs. I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

jpouellet commented 6 years ago

Not to rain on the wayland parade, but I'm not convinced the potential benefit over the current system is as large as you portray.

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

The current gui protocol/implementation already does have guests blit directly to a shared-memory framebuffer not requiring any copying between VMs. What exactly would Wayland improve about this?

This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

I believe this is highly unlikely to happen. The security risk is just too high IMO.

All rendering in the guests happen in software, and IMO that's very unlikely to change unless GPUs get proper memory protection so e.g. shaders can be mutually isolated in different address spaces, enforced in hardware.

The GVT-g approach of "just try to arbitrate everything in software" strongly reminds one of Xen paravirtualization, which we've moved away from in R4 because it's proven too hard to get right and became a liability.
Other approaches which somehow result in at least some kind of indirect hw acceleration like Virgil 3d (translate/emulate shader IL) is a graphics-analog of QEMU (in full instruction emulation mode no less!), which Qubes has explicitly architected around not trusting.

IMO it's way too complex to be even worth considering from a security standpoint.

Even just yesterday's OS X security advisory had 3 new CVEs for their intel graphics driver interface, allowing sandbox escapes & privilege escalation. I haven't seen any technical write-ups yet, but I'm willing to bet there are still plenty more holes in that interface.

I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

And I am interested in being a GSoC mentor for Qubes again. I'm definitely in no position to make any promises about this project, but I look forward to seeing a proposal and your patches in general :)

marmarek commented 6 years ago

As @jpouellet said, benefits may not be that large. But this could be still useful thing to do. Xorg and X11 protocol in general is quite complex and from time to time we hit some strange interactions between different toolkits and our GUI. Wayland could make things easier here. So, 👍 from me, including GSoC 2018 (we will apply this year too).

artemist commented 6 years ago

Thanks! Even with the problems @jpouellet mentioned, I think that there still could be be some advantages.

A few thoughts I wanted to write down so I don't forget:

The main reason I wanted to start this in the first place was multiple DPI support, and that could be useful, although we have to deal with privacy concerns.

I think we could still reduce RAM usage by sharing the same memory for the framebuffer in the client in the AppVM, the stub compositor in the AppVM, the stub client in the GuiVM, and the real compositor in the GuiVM. It may also be possible to do this in X11 with proper proxying of MIT-SHM, but I can't find any code doing it, and doing so may increase complexity significantly. (I may also just be misunderstanding X Display Lists though). Shared memory does open us up to easy cache attacks, but I can't think of any one can do based off of a framebuffer, especially since one does not generally draw directly onto it because of double buffering, IIRC. Nevertheless, I will have to look into how much the GuiVM is trusted, and if cache attacks originating from it would be a concern.

We can remove GVT-g from the picture: I thought it used newer isolation features since my laptop didn't support it, but I guess not. Further research does show it is basically PV. However, It still may make graphics acceleration with GPU passthrough easier, as there is no need to mess with X11 graphics extensions, only OpenGL/CL libraries. It looks like NVIDIA and AMD also have some interesting (SR-IOV for AMD) isolation features for fancier GPUs, although those seem really really expensive and only easily available on certain servers.

jpouellet commented 6 years ago

It may also be possible to do this in X11 with proper proxying of MIT-SHM

It is my understanding that that is already how things are done. I refer you to https://www.qubes-os.org/doc/gui/#window-content-updates-implementation

but I can't find any code doing it

Some pointers:

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

jpouellet commented 6 years ago

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from. Currently we only have one GuiVM (dom0) which must already be ultimately trusted and already has full access to everything anyway. However, down the road it is desirable to move the window manager out of dom0 and remove its ability to control dom0 (and in certain use cases perhaps also remove its ability to control some other VMs managed by an external admin).

ghost commented 6 years ago

Wouldn't using wayland increase the security of xscreensaver too?

artemist commented 6 years ago

@blacklight447 Yes, screen lockers are harder to crash in Wayland.

However, that reminds me of another problem: Screen lockers, like the rest of the compositor, are all part of the same window manager process. This means that we may have to make significant changes to each desktop environment. At minimum, it would just be to have coloured decorations. I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

marmarek commented 6 years ago

I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

I hope it is true. But at least for GNOME, there is big push to client-side decorations, so I'm not so sure about it.

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from.

Clarification: theoretically GuiVM may not have full control over input. It may be reduced to only controlling input focus. But in the first version it probably will have full control.

DemiMarie commented 6 years ago

As far as graphics acceleration, modern GPUs do have an MMU that can enforce page protection. The problem is arbitrating access to it between VMs. I can think of a few solutions:

Do not expose the MMU to VMs — attempts to modify the MMU from a VM are trapped and ignored.
Trap-and-emulate (shadow page tables). Too complex? Seems to me to be similar to virtualizing a CPU without SLAT.
Paravirtualization. We only need to handle rendering commands (nothing else makes sense for a VM to do). My understanding is that that is just buffer management — everything else is handled in hardware.

This seems simple — not more complicated than Xen’s own management of CPU memory, or a kernel’s management of mmap’d buffers. Linux has had many vulnerabilities, but none in the mmap code, if I understand correctly.
On twin-GPU systems, where one GPU is not connected to any display, we can give that GPU to a VM entirely, relying on the IOMMU to prevent access to GPU-internal registers and firmware. This presumes that those are not in the GPU’s address space.

While obviously suboptimal, this approach works fantastically in one (very important, IMO) use case: gaming.

Of these, 3 and 4 seem the most promising to me. The API for 3 sounds (deceptively?) small:

// A handle to a GPU buffer
typedef int gpu_buffer_t;

// Get a buffer, or -1 on error
int gpu_mmap(uint64_t size);

// The mapping mode
enum gpu_mode_t {
    RO, RW, WO,
};
// Map the buffer, returning its GPU address in *addr
int gpu_map(gpu_mode_t mode, int handle, uint64_t *addr);

// Unmap the buffer
int gpu_unmap(int handle);

// Destroy the buffer
int gpu_free(int handle);

Of course, these are just ideas, and I could be completely and utterly wrong.

Hello71 commented 6 years ago

Screen lockers, like the rest of the compositor, are all part of the same window manager process.

From what I understand, this is true in "standard" Wayland, but there is a wlroots protocol extension, "input inhibitor", that allows the screen locker to operate as a separate process. On sway, swaylock is a completely separate program from the main compositor.

The API for 3 sounds (deceptively?) small:

I believe this API already exists, it is called "DMA-BUF".

http://phd.mupuf.org/files/fosdem2013_drinext_drm2.pdf specifically references Qubes, so I would hope that security has been a legitimate consideration in the new API development.

DemiMarie commented 6 years ago

Also, it seems that modern drivers already virtualize the GPU, with isolation enforced either in hardware or software. Modern GPUs support both, so one could use hardware isolation between VMs, and software isolation within a VM.

thearthur commented 4 years ago

I'm waiting for this one to try out Cubes OS. I understand this will be a long wait. Just wanted to say hi. :wave:

edrex commented 4 years ago

https://spectrum-os.org/ is a project to build a compartmentalized OS on crosvm, nixos, and wayland, still early days but really exciting.

DemiMarie commented 3 years ago

@marmarek: How much will the GUI protocol need to change? Can XWayland be used as a transitional option, if shmoverride is applied to the Wayland compositor too?

DemiMarie commented 3 years ago

One major advantage of Wayland is that Wayland subsurfaces can be mapped by the GUIVM and composited on the GPU. This should be much more efficient (both in CPU usage and power consumption) than CPU-side compositing by the X server, but requires caution to ensure that a client cannot draw outside of what Qubes OS considers the borders of its window.

marmarek commented 3 years ago

On Wed, Jun 16, 2021 at 08:34:59AM -0700, Demi Marie Obenour wrote:

One major advantage of Wayland is that Wayland subsurfaces can be mapped by the GUIVM and composited on the GPU. This should be much more efficient (both in CPU usage and power consumption) than CPU-side compositing by the X server, but requires caution to ensure that a client cannot draw outside of what Qubes OS considers the borders of its window.

X server can do that too if you enable compositing in window manager settings (I think it's enabled by default). We use MIT-SHM extension specifically to map the composition buffers directly into X server, without copying inside gui-daemon.

Geblaat commented 2 years ago

Wayland functionality for Spectrum OS will be integrated into upstream Wayland, which might be interesting for Qubes OS: https://spectrum-os.org/lists/hyperkitty/list/discuss@spectrum-os.org/thread/3VYGG3QLV37IJDQL3SZZMTOTJ5ZZKZFL/

hexagonrecursion commented 2 years ago

There is now a bounty for this issue https://app.bountysource.com/issues/52352776-use-wayland-instead-of-x11-to-increase-performance

iacore commented 2 years ago

I found this Wayland/X11 nested compositor from ChromiumOS: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/

X11 Sommelier An X11 sommelier instance provides X11 forwarding. Xwayland is used to accomplish this. A single X11 sommelier instance is typically shared across all X11 clients as they often expect that they can use a shared X server for communication. If the X11 sommelier instance crashes in this setup, it takes all running X11 programs down with it. Multiple X11 sommelier instances can be used for improved isolation or when per-client configuration is needed, but it will be at the cost of losing the ability for programs to use the X server for communication between each other.

Seems like it can be used as X11 compositor as well, and can replace current qubes-gui and qubes-guid. It seem to also support different seats (for gaming/ game controllers).

DemiMarie commented 2 years ago

I found this Wayland/X11 nested compositor from ChromiumOS: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/

X11 Sommelier An X11 sommelier instance provides X11 forwarding. Xwayland is used to accomplish this. A single X11 sommelier instance is typically shared across all X11 clients as they often expect that they can use a shared X server for communication. If the X11 sommelier instance crashes in this setup, it takes all running X11 programs down with it. Multiple X11 sommelier instances can be used for improved isolation or when per-client configuration is needed, but it will be at the cost of losing the ability for programs to use the X server for communication between each other.

Seems like it can be used as X11 compositor as well, and can replace current qubes-gui and qubes-guid. It seem to also support different seats (for gaming/ game controllers).

I recommend against Sommelier. It is written in C++ and Thomas Leonard found that it kept crashing for him. His own proxy (written in OCaml) is probably a better choice.

bi0shacker001 commented 4 months ago

Do we have any updates on the status of this? It looks like Hardware Acceleration is blocked by it, and it will also enable autorotate on convertibles, which would be very useful for my use-case.

DemiMarie commented 4 months ago

The current plan is to replace the GUI agent with wayland-proxy-virtwl, which will be connected via a Rust program to an instance of crosvm running on the host. crosvm will then proxy this via another instance of wayland-proxy-virtwl to the host compositor.

kravemir commented 3 months ago

The move to Wayland would be great, because Wayland has official Fractional scale protocol.

Nowadays, monitors come with quite odd PPI's, and integer HiDPI scaling isn't particularly usable anymore. With 200% scaling everything is too big, but on 100% everything is too small.

Besides the odd PPI of modern displays, fractional scaling helps with accessibility - some people scale normal DPI displays to 125%.

I'm wanting to try and use QubesOS, as I want to separate personal stuff, hobby work and client work, and QubesOS is perfect fit. However, both - my laptop's screen and desktop monitor - need 150~160% scaling, and 1x or 2x scaling makes it practically unusable (ergonomically, long-term). So, this is quite a no-go blocker for me at the moment.

QubesOS / qubes-issues