LeftoverLocals mitigation

DemiMarie commented 3 months ago

I’m not sure if this is the right repository or if this is even appropriate to file an issue for, but I noticed that Apple GPUs are vulnerable to LeftoverLocals. Apple seems to not be fixing this vulnerability, but hopefully Asahi can do better.

jannau commented 3 months ago

Their OpenCL test application only reports zeroes. Test needs RUSTICL_ENABLE=asahi to run. Vulkan driver is not ready yet.

DemiMarie commented 3 months ago

Does the OpenCL compiler zero-initialize local variables? My understanding is that one cannot test this in OpenCL or Vulkan without relying on undefined behavior, but one can test this with hand-written GPU assembly code.

alyssarosenzweig commented 3 months ago

I recommend you don't run LLMs. I don't se a reasonable driver fix for this.

DemiMarie commented 3 months ago

I recommend you don't run LLMs. I don't se a reasonable driver fix for this.

What would an “unreasonable” one be? Serialize all access to the GPU so that only one program at a time can use it, so that the kernel can zero local variables between contexts? I think this is what AMD is working on right now. ChromeOS currently ships a simpler mitigation for AMD GPUs: they zero out all local variables after each graphics shader, and those are non-preemptable so there is no race condition.

The reason I am interested is that this is a blocker for virtio-GPU native context support, which is already being worked on IIUC.

DemiMarie commented 3 months ago

I will also add that I’m not at all interested in LLMs. It’s graphics workloads (such as a sandboxed app taking a screenshot without authorization) that I am concerned about. If OpenGL/Vulkan aren’t affected then that makes this much less important.

DemiMarie commented 3 months ago

Also, “we don’t have the resources to fix this, Apple will need to fix it in firmware” is a reasonable response here!

alyssarosenzweig commented 3 months ago

I recommend you don't run LLMs. I don't se a reasonable driver fix for this.

What would an “unreasonable” one be? Serialize all access to the GPU so that only one program at a time can use it, so that the kernel can zero local variables between contexts? I think this is what AMD is working on right now. ChromeOS currently ships a simpler mitigation for AMD GPUs: they zero out all local variables after each graphics shader, and those are non-preemptable so there is no race condition.

The reason I am interested is that this is a blocker for virtio-GPU native context support, which is already being worked on IIUC.

We don't have hardware docs. Who knows? The simpler mitigation could maybe be made to work but it would take nontrivial engineering resources.

I will also add that I’m not at all interested in LLMs. It’s graphics workloads (such as a sandboxed app taking a screenshot without authorization) that I am concerned about. If OpenGL/Vulkan aren’t affected then that makes this much less important.

I don't know if they're affected. Apple knows these things, we don't. I suspect LeftoverLocals could be used for exactly that (cross processing screen grabs) on AGX specifically. Researchers would've missed this because it can't happen with GL or Metal, but if you can hit the kernel api directly (you can, userspace graphics libs aren't a security boundary), you can probably make that issue visible.

Also, “we don’t have the resources to fix this, Apple will need to fix it in firmware” is a reasonable response here!

That'd be my response then ^^

DemiMarie commented 3 months ago

I recommend you don't run LLMs. I don't se a reasonable driver fix for this.

What would an “unreasonable” one be? Serialize all access to the GPU so that only one program at a time can use it, so that the kernel can zero local variables between contexts? I think this is what AMD is working on right now. ChromeOS currently ships a simpler mitigation for AMD GPUs: they zero out all local variables after each graphics shader, and those are non-preemptable so there is no race condition. The reason I am interested is that this is a blocker for virtio-GPU native context support, which is already being worked on IIUC.

We don't have hardware docs. Who knows? The simpler mitigation could maybe be made to work but it would take nontrivial engineering resources.

Interesting. Are graphics (as opposed to compute) non-preemptable?

The reason I mentioned virtio-GPU native contexts is that that is something people will usually consider a security boundary. If a VM can grab the host’s screen people could be surprised quite badly. “Don’t give untrusted guests GPU acceleration” is standard advice pre-native contexts, but native contexts are supposed to be the more secure alternative.

I will also add that I’m not at all interested in LLMs. It’s graphics workloads (such as a sandboxed app taking a screenshot without authorization) that I am concerned about. If OpenGL/Vulkan aren’t affected then that makes this much less important.

I don't know if they're affected. Apple knows these things, we don't. I suspect LeftoverLocals could be used for exactly that (cross processing screen grabs) on AGX specifically. Researchers would've missed this because it can't happen with GL or Metal, but if you can hit the kernel api directly (you can, userspace graphics libs aren't a security boundary), you can probably make that issue visible.

I wonder if Apple would issue a bounty for a successful exploit. If not, it might make them fix the problem if they haven’t done so already.

Also, “we don’t have the resources to fix this, Apple will need to fix it in firmware” is a reasonable response here!

That'd be my response then ^^

That’s fair! I hope that this is fixed in firmware, as opposed to a driver fix that Asahi users won’t get.

That said, this is fixed in M3 (as per the Trail of Bits report), and IIUC you plan to support that eventually, which is good.

(I hope that this doesn’t come across as “Demi wants you all to do stuff for free” and instead as “Demi wants to make sure your users are protected.” Placing demands on open source maintainers is a serious problem and is not my intent.)

AsahiLinux / linux

LeftoverLocals mitigation #278