QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
532 stars 46 forks source link

Document requirements for secure GPU-accelerated rendering #4234

Closed DemiMarie closed 6 months ago

DemiMarie commented 6 years ago

Qubes OS version:

R4.0

Affected component(s):

Documentation


Steps to reproduce the behavior:

Search for documentation for what exactly would be necessary for secure, hardware-accelerated rendering in untrusted AppVMs

Expected behavior:

Some documentation as to why it is not possible currently, and what would be required to change that.

Actual behavior:

No such documentation.

General notes:

One obvious use of a system like Qubes is running untrusted Steam games. Because Qubes doesn’t support hardware-accelerated rendering, this doesn’t work. It would be nice to have documentation as to why this is not possible, and what would be necessary to change this.


Related issues:

andrewdavidwong commented 6 years ago

The documentation is a community effort. Please help us improve it by submitting a pull request:

https://www.qubes-os.org/doc/doc-guidelines/

DemiMarie commented 6 years ago

@andrewdavidwong I would be happy to, but to do so I need to know what the QubesOS maintainers would require of such a solution.

andrewdavidwong commented 6 years ago

@DemiMarie:

I would be happy to, but to do so I need to know what the QubesOS maintainers would require of such a solution.

I suggest searching qubes-devel to see whether the information you need is already there and, if not, starting a thread asking specific questions.

teoman002 commented 5 years ago

A known problem why gpu acceleration in VMs is a secruity risk: It is not possible due to VRAM Leaks of graphic cards, read this report with many pictures as a proof. https://hsmr.cc/palinopsia/ (cited from whonix footnotes) https://www.whonix.org/wiki/Virtualization_Platform_Security#cite_note-3

Description of the problem: VRAM leakage enables guest operating systems to access VRAM content from Qubes dom0 area, which can be a secruity risk, if dom0 is spied on in the right time or all the time. This can be done by allocating VRAM without initializing the content. The reason why GPU manufacturers don't implement initialization of VRAM content might be, because it decreases either the lifespan of the memory or slows down performance, although I can't site any sources on this. It's also possible that they had no reason to do so till now.

Possible approach: In order to change the situation one has to tell the graphic card to initialize after memory allocation. AMD has linux open source drivers, maybe a driver developer can investigate the altered behavior when such a function is implemented. But changing such a fundamental behavior could have devastating effects on the API, because everyone expect memory to behave the old way.

sajkbflksadbfkasdjf commented 2 years ago

A known problem why gpu acceleration in VMs is a secruity risk: It is not possible due to VRAM Leaks of graphic cards, read this report with many pictures as a proof. https://hsmr.cc/palinopsia/ (cited from whonix footnotes) https://www.whonix.org/wiki/Virtualization_Platform_Security#cite_note-3

Description of the problem: VRAM leakage enables guest operating systems to access VRAM content from Qubes dom0 area, which can be a secruity risk, if dom0 is spied on in the right time or all the time. This can be done by allocating VRAM without initializing the content. The reason why GPU manufacturers don't implement initialization of VRAM content might be, because it decreases either the lifespan of the memory or slows down performance, although I can't site any sources on this. It's also possible that they had no reason to do so till now.

Possible approach: In order to change the situation one has to tell the graphic card to initialize after memory allocation. AMD has linux open source drivers, maybe a driver developer can investigate the altered behavior when such a function is implemented. But changing such a fundamental behavior could have devastating effects on the API, because everyone expect memory to behave the old way.

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed? It should definitely be doable and wasting one of the two most powerful chips in your computer because the virtualisation is too hard to get right seems wasteful to me.

thw0rted commented 2 years ago

If more use cases help at all: I'm a web developer writing a WebGL-based application. Some of my application's users are on a Qubes-based desktop environment. Their performance is currently terrible, and I think this issue is the reason.

Users on a physical desktop with a mid-grade dedicated GPU can get a smooth 60FPS performance with minimal load on the GPU. Even those with Intel integrated graphics from several years ago can manage a pretty stable 30FPS, as long as they're using the right drivers. Users on the Qubes platform are stuck with Chrome's "SwiftShader" pure-software renderer, which takes more than one second per frame (!).

I think this problem is only likely to become more prevalent in the future, as more applications move to the browser, and browsers become more reliant on hardware-accelerated compositing and rendering.

DemiMarie commented 2 years ago

If more use cases help at all: I'm a web developer writing a WebGL-based application. Some of my application's users are on a Qubes-based desktop environment. Their performance is currently terrible, and I think this issue is the reason.

First, thanks for letting us know. The current situation sucks, but it is the best we can do right now. Sadly, supporting it securely on cards that people can actually afford (as opposed to super-expensive enterprise cards) is still an open problem. The good news is that people are working on it; the bad news is that these efforts could take quite a while to come to fruition. If Qubes can ever ship hardware accelerated rendering on by default without violating our user’s security expectiations, we will.

Are you able to link to the web application, by any chance?

Users on the Qubes platform are stuck with Chrome's "SwiftShader" pure-software renderer, which takes more than one second per frame (!).

Yea, this sucks. Is your application particularly heavy on textures? Texture sampling is notoriously slow in software renderers; there is a reason GPUs have dedicated hardware for it. 3D transformations are also quite expensive. Simple pixel shaders should (at least in theory) be able to be handled pretty efficiently using SIMD instructions on the CPU, though Qubes disabling SMT (simultaneous multithreading AKA hyper-threading) probably doesn’t help there.

I also suggest filing a ticket with SwiftShader; you could be hitting a pathological case there. At they very least, they might have suggestions as to what part of the code is particularly expensive.

thw0rted commented 2 years ago

I don't have the Qubes-based desktop environment available right now, but I think you could get a good idea of the problem using this demo page. (Our application is proprietary, but built on CesiumJS.)

If you paste in viewer.scene.debugShowFramesPerSecond = true; just after the first line of code, then click Run (F8), it will reload the demo with a FPS counter in the corner. With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

I will look at filing a ticket with Chromium and/or Cesium, but I kind of expect each to point the finger at the other...

marmarek commented 2 years ago

With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

FWIW I get ~9FPS in Firefox and ~4FPS in Chrome. Both are bad, but one is clearly worse...

thw0rted commented 2 years ago

I'm curious, if you visit the regular public-facing Google Maps from your Qubes environment, do you feel like you have a subjectively bad experience? I think they're using similar technologies. When I force software rendering on the same laptop ("hardware acceleration" option off in Chrome settings), zooming and panning "feel" bad. Maybe that could be a more broadly applicable use case for you?

marmarek commented 2 years ago

Google Maps works reasonably fine. It isn't super smooth, I see some jumps when panning too fast, but nothing major.

tzwcfq commented 2 years ago

I've tested as well: Intel i9-12900K - 8 P-cores and 8 E-cores Firefox 91.10.0esr on debian-11 with 16 vcpu. Open earth.google.com press F12 -> open Performance tab -> Start Recording Performance. When I don't move the screen it stays at 60 fps, when I move screen around and zoom in/out the average fps is 45 and minimum is 18 fps. When I try cesium example I'm getting 8.5 fps on average, 13 fps maximum and 4.5 fps minimum.

thw0rted commented 2 years ago

Just as a data point, I'm interested to see how that compares with Chrome / Chromium. It sounds like several others have answered "worse" but it also sounds like they may have been using older (or at least less powerful) hardware.

tzwcfq commented 2 years ago

Google Chrome Version 102.0.5005.61 (Official Build) (64-bit) on debian-11 with 16 vcpu. To enable FPS meter open Dev Tools Console with Ctrl+Shift+I, with focus on Dev Tools press Ctrl+Shift+P and type "FPS" and press enter. It shows only average fps (or something like that, I'm not sure how it's counted). With earth.google.com I'm getting 45 fps average if I don't move screen and around 1 fps less if I move screen around and zoom it. With cesium I'm getting 2.8 fps if I don't move screen and 2.3 fps if I zoom in/out.

DemiMarie commented 2 years ago

With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

FWIW I get ~9FPS in Firefox and ~4FPS in Chrome. Both are bad, but one is clearly worse...

I can reproduce with a 1-vCPU qube. Chrome is at less than 0.5FPS while using 90+% CPU. Definitely time for a bug report.

thw0rted commented 2 years ago

So, just to be clear, @DemiMarie , where do you think the report should go, and who do you think should file it?

DemiMarie commented 2 years ago

@thw0rted I think you should file the report against Chromium and SwiftShader. I know basically zilch about SwiftShader internals, and it has been years since I did anything interesting with web APIs.

thw0rted commented 2 years ago

I just filed https://bugs.chromium.org/p/chromium/issues/detail?id=1348913 , in case anybody here would like to follow along. Thanks for the feedback!

marmarek commented 2 years ago

I just filed https://bugs.chromium.org/p/chromium/issues/detail?id=1348913

"Windows"?

thw0rted commented 2 years ago

Hah! Oops, sorry. It auto-populated the form from the computer I was using and I didn't think to update. It does actually apply on Windows as well, inasmuch as forcing Firefox to disable hardware rendering here still outperforms Chrome when forced to use SwiftShader. Really, the OS tag should say "any". I'll leave a comment to that effect.

JonasVautherin commented 2 years ago

Just discovered this issue, and I am curious about this comment that seems to have been ignored:

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed?

Is that obviously not an option?

DemiMarie commented 2 years ago

Just discovered this issue, and I am curious about this comment that seems to have been ignored:

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed?

Is that obviously not an option?

That would require an extremely complex translation layer.

rwiesbach commented 1 year ago

There was a talk on Qubes OS Summit 2022 https://www.youtube.com/watch?v=YllX-ud70Nk - what does it mean for secure GPU-accelerated rendering? Does it sacrifice security? And if so: to what extend does it sacrifice security?

DemiMarie commented 1 year ago

It doesn’t intend to, but it isn’t ready for use in Qubes yet, in part because of Xen limitations.

AlxHnr commented 1 year ago

How feasible would it be to add an unfiltered, less secure way to expose the GPU to selected VM's? Similar to how KVM/virt-manager can do it.

Why?

I trust certain VMs enough to take a risk here. Having a dedicated YouTube qube or video edit qube could save a lot of power and time. It will still provide much better isolation and sandboxing than raw Linux. Like a form of Qubes Lite, which sits between the current Qubes OS and conventional desktop systems. The only alternative here is building my own "Qubes" based on Linux + KVM, with a lot of dev effort for proper clipboarding, auto-updates, microphone and more.

DemiMarie commented 1 year ago

@marmarek thoughts?

covert8 commented 1 year ago

Has anyone considered using virtio-gpu or virgl from a security standpoint? It could also allow using the GPU across multiple cubes.

DemiMarie commented 1 year ago

@covert8 Its funny you asked, because I was about to file an issue for this!

VirGL and Venus run the userspace driver (OpenGL and Vulkan respectively) on the host. This means that they provide a hardware-independent API to the guest, but also means that the entire userspace stack becomes guest-accessable attack surface. This attack surface is very large, and Linux graphics developers have stated on IRC that it is not a security boundary. Therefore, @marmarek has decided that Qubes OS will never use VirGL or Venus and I agree with his decision.

virtGPU native contexts, on the other hand, expose the kernel ioctl API to the guest. This API is accessible to unprivileged userspace processes, which means it is a supported security boundary. It is also much smaller than the OpenGL or Vulkan APIs, which means that its attack surface is vastly smaller. As a bonus, native contexts offer near-native performance, which should be enough even for games and other demanding tasks.

The kernel ioctl API (also known as the userspace API or uAPI) is hardware-dependent, so virtGPU native contexts are only supported on a subset of hardware. Currently, Intel, AMD, and Adreno GPUs are supported using the upstream i915, amdgpu, and freedreno drivers.

Xen supports grant-based virtio, so virtio-GPU should not be incompatible with running QEMU in a stubdomain. The virtio-GPU emulator will need to run in dom0, but its job is much simpler than that of QEMU (it is only emulating a single device) and so the attack surface should (hopefully!) be acceptable.

andrewdavidwong commented 1 year ago

@DemiMarie, documentation issues like this go on the Non-release milestone, since they are independent of the Qubes OS release cycle.

ddevz commented 8 months ago

I wanted to make people aware of this related thread:

https://forum.qubes-os.org/t/seamless-gpu-passthrough-on-qubes-os-with-virtualgl/20265/18

Which talks about someone using virtualGL (which i believe is different then virGL)

(I'm assuming that the intention was to use virtualGL to communicate between a "sys-gpu" type qube to other cubes that you wanted to be able to access GL calls, however the initial post is incomplete, and no one has been able to replicate it, so the full intention is fuzzy.)

DemiMarie commented 7 months ago

Current dependency tree:

  1. Intel virtio-GPU native contexts (under development at Intel)
  2. AMD virtio-GPU native contexts (under development at AMD)
  3. virtio-GPU with Xen on not-QEMU
    1. virtio-GPU on Xen + QEMU (under development at AMD)
    2. virtio-GPU on KVM + not-QEMU (shipping in Chromebooks)
    3. Wayland everywhere.
      1. Port of virtio-GPU to not-QEMU + Xen (protocol only, no hardware acceleration required)
      2. Something to draw the borders
      3. Port various GUI stuff to use wlr-layer-shell instead of X11 override-redirect windows (@marmarta maintains this code, but I’ve agreed to provide any help needed).
      4. Central notification daemon, since VMs no longer have override-redirect windows (I’m working on this).
      5. StatusNotifierItem instead of XEmbed (I was working on this somewhat, now stalled temporarily due to Rust dependency hell).
      6. D-Bus menu implementation (probably rendered in the VM and drawn on the host via another layer-shell surface).
DemiMarie commented 7 months ago

@thw0rted One idea I just now had was to see about optimizations on the CesiumJS side. Even when a GPU is available and in use, forcing it to 100% usage cannot be good for battery life on mobile.

thw0rted commented 7 months ago

I was just a Cesium user, never on their dev team, and I've since moved on to another project.

That said, I think they already did what they could for optimization. It's a graphically-intensive 3D application, so battery drain should be treated more like running a mobile game than a regular web page. They did include an option for the application developer to trigger rendering manually, so that the render process would idle otherwise, which I think would help a lot in a mobile context.

The problem on Qubes was that even in short bursts (like scrolling a map), the software WebGL implementation was so slow that you could easily drop down to seconds-per-frame rather than frames-per-second. With the manual rendering option, you might only kick up to high power drain for 5 seconds of pan-and-zoom, then go back to idle, but with hardware acceleration those 5 seconds felt nice and smooth.

DemiMarie commented 7 months ago

Thanks for the explanation @thw0rted!

DemiMarie commented 6 months ago

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

rwiesbach commented 6 months ago

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

Has it? awesome. Is there another GithHub issue for that? (which one?!)

DemiMarie commented 6 months ago

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

Has it? awesome. Is there another GithHub issue for that? (which one?!)

8552 and https://github.com/orgs/QubesOS/projects/17

github-actions[bot] commented 6 months ago

This issue has been closed as "not applicable." Here are some common examples of cases in which issues are closed as not applicable:

We respect the time and effort you have taken to file this issue, and we understand that this outcome may be unsatisfying. Please accept our sincere apologies and know that we greatly value your participation and membership in the Qubes community.

Regarding help and support requests, please note that this issue tracker (qubes-issues) is not intended to serve as a help desk or tech support center. Instead, we've set up other venues where you can ask for help and support, ask questions, and have discussions. By contrast, the issue tracker is more of a technical tool intended to support our developers in their work. We thank you for your understanding.

If anyone reading this believes that this issue was closed in error or that the resolution of "not applicable" is not accurate, please leave a comment below saying so, and we will review this issue again. For more information, see How issues get closed.