QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
534 stars 46 forks source link

Ease debugging Xen issues #6834

Open marmarek opened 3 years ago

marmarek commented 3 years ago

On laptops lacking physical serial console, collecting logs of a crashed Xen (or dom0 kernel) is hard. One method is to use kexec to extract them from the RAM. Currently this requires several non-trivial changes to the system. This issue is to ease preparing system for such debugging.

Details from the original issue:

Isn't there tweaks that can be added from boot command line which would make logs saved synchronously and permit to troubleshoot this for everyone without a need of external additional devices @marmarek?

Sadly, not. When Xen panic, dom0 has no chance to execute anything anymore. And it's dom0 who writes logs to the disk (or anything else for that matter).

@marmarek @fepitre From this comment:

As for collecting logs, I do have a method for that, but it's quite adventurous - make Xen kexec Linux on panic, to dump the memory content, and then to extract logs from there. It requires custom rebuilding Xen (because we have kexec disabled by default), kexec-tools (because the one in Fedora isn't built with Xen), build initramfs that dump the memory (kdump package helps with that, but requires few tweaks to make it work), and finally have some place to save the dump too - if using the main disk, you need to configure LUKS with a key file + allocate quite a lot of RAM for the crashkernel (because of memory-hard Argon2i in LUKS2). When having the dump, extracting messages is relatively easy - I use strings for that. Theoretically the crash utility can do that more conveniently, but it didn't work for me.

This exact QubesOS 4.1 ISO deployment (debug iso with all above dependencies preconfigured) would help to debug so many corner cases for QubesOS project, including this one? Instruct a willing tester to install QubesOS on spare drive and report results without having to go through the burden of customizing such testing testbed? Should that be discussed in a separate issue (with more details on doing this manual and then automate the process, maybe?)

if using the main disk, you need to configure LUKS with a key file + allocate quite a lot of RAM for the crashkernel (because of memory-hard Argon2i in LUKS2)

@marmarek / @fepitre : Random thought, but if there is enough space affected to /boot in the default partitition scheme of that debug iso, dumping said logs from kexec'ed linux kernel to dump memory under /boot would work around a lot of the complexity added dumping needed logs and go faster into having what logs we need, easily, from an additioanl boot of said debug ISO in Read Only rescue mode from boot options.

Originally posted by @tlaurion in https://github.com/QubesOS/qubes-issues/issues/6066#issuecomment-895447248

tlaurion commented 3 years ago

It was discussed briefly off issue that a debug ISO would miss the point, where having a different xen-debug source tree, clearly differentiable from the Xen standard target, would make it easier to maintain and deploy from testing repository when needed.

That Xen + linux emergency kernel to be kexec'ed + kexec dependencies for dom0 should make its way into a grub clearly separated entree for users to select and as wished end result, be able to extract needed information from the memory dump (saved under /boot? which was why I talked about a debug ISO at the first place, so that /boot is big enough to dump memory content? Saved under swap? As Marek stated, having the emergency kernel dump into root would require LUKS key to be passed, which in most user cases is not possible.)

Thoughts on better approaches?

fepitre commented 3 years ago

I'm inclined to help into this and maybe a specific ISO_FLAVOR could be weekly built.

DemiMarie commented 3 years ago

Dumping into /boot has the obvious problem that it is unencrypted. Could some form of asymmetric encryption be used?

marmarek commented 3 years ago

I'd rather embed LUKS key into the initramfs that is loaded to be kexec'ed. Such initramfs would be sensitive then, so storing it in /boot is out of the question. Not sure what's better:

Since Argon used by LUKS2 needs more RAM than its useful to assign for the crash kernel (*), I consider injecting the actual LUKS master key. This may need some initramfs adjustments.

Alternative option is to dump into some USB stick. But that's way less reliable (especially if kexec'ed Linux won't undo IOMMU setup done by Xen).

(*) This memory needs to be reserved at the boot time, and cannot be used for anything else - if you need 1.5GB for the crash kernel, you'd have that much less for the actual system.

DemiMarie commented 3 years ago
  • inject LUKS key at the load time

This definitely seems like the better option. Having the key be in a file on disk risks it leaking, which would be bad. I would much prefer for it to remain entirely in-memory.

marmarek commented 3 years ago

If neither the thunderbolt nor expresscard are available and the machine is recent enough to have one or more M.2 slots, and you don't mind sessions with the laptop case off (or potentially ribbon cables dangling out of the case), there's this clever solution. After all, an M.2 device is just a pcie device:

https://www.ebay.com/itm/4-Port-RS-232-DB9-Serial-M-2-B-M-Key-2280-Controller-Card-Asix99100-Chipset-/274780227172?mkcid=16&mkevt=1&_trksid=p2349624.m46890.l6249&mkrid=711-127632-2357-0

Excellent idea @brendanhoar! After few tweaks and hitting unrelated bug, I got this working :) Debugging suspend issues should be much easier (for me, at least) now.

marmarek commented 3 years ago

I consider injecting the actual LUKS master key.

I wanted to extract the master key from the running kernel, but this is actually impossible on R4.1, because cryptsetup loads it into kernel keyring as a "logon" key - which can be read only by the kernel. I mean, technically there surely is some hack to extract it, but not a proper API. This means, we either need to store a keyfile somewhere on the disk, or ask the user for disk passphrase the second time during boot (or more precisely: when loading crash kernel).

DemiMarie commented 3 years ago

I consider injecting the actual LUKS master key.

I wanted to extract the master key from the running kernel, but this is actually impossible on R4.1, because cryptsetup loads it into kernel keyring as a "logon" key - which can be read only by the kernel. I mean, technically there surely is some hack to extract it, but not a proper API. This means, we either need to store a keyfile somewhere on the disk, or ask the user for disk passphrase the second time during boot (or more precisely: when loading crash kernel).

Patch cryptsetup?

marmarek commented 3 years ago

Patch cryptsetup?

No, I don't want to weaken security feature for everyone, just to make debugging easier for few. I'd rather consider saving keyfile somewhere on the root fs - which would affect only those who opt-in for debug setup.

DemiMarie commented 3 years ago

Patch cryptsetup?

No, I don't want to weaken security feature for everyone, just to make debugging easier for few. I'd rather consider saving keyfile somewhere on the root fs - which would affect only those who opt-in for debug setup.

Serious question: how is this security feature useful in Qubes OS? Once someone has code exec in dom0 it is game over anyway.

marmarek commented 1 year ago

Since https://github.com/QubesOS/qubes-vmm-xen/pull/138 is merged, here is how to use it to get Xen's console:

  1. Your system needs to support "xHCI Debug Capability". The easiest way to check is to look for /sys/bus/pci/devices/*/dbc file in sys-usb, if it's there and says "disabled", it's okay.
  2. You need a second computer with USB3 controller. This second computer doesn't need debug capability. Raspberry Pi 4 is enough (and probably clones that have USB3 too).
  3. Get a USB3 cable with "A" plug on both ends - for example https://www.datapro.net/products/usb-3-0-super-speed-a-a-debugging-cable.html or https://www.amazon.com/EXLUWOR-Windbg-Super-Speed-Debugging-windbg/dp/B08XYXZ4PG
  4. If that's a generic A-A cable (not specifically "debug" cable), tape over VBUS, D+ and D- connectors on one end (see https://pinoutguide.com/Slots/usb_3_0_connector_pinout.shtml for example)
  5. Plug the cable to both computers.
  6. Add dbgp=xhci,share=yes console=vga,xhci to Xen's cmdline (either in grub menu directly - the line with multiboot2 /xen-*.gz, or in /etc/default/grub in GRUB_CMDLINE_XEN_DEFAULT option). If your system have multiple USB3 controllers, you may need to specify it explicitly - for example dbgp=xhci@pci00:14.0,share=yes console=vga,xhci - see lspci command.
  7. Restart the system
  8. Look for /dev/ttyUSB0 on the second computer - it should appear when Xen starts.
  9. Get console output by picocom -b 115200 /dev/ttyUSB0 (or similar program - minicom, cutecom or even cat works).
3hhh commented 1 year ago

So if I understand share=yes correctly, the device can even be assigned to sys-usb at the same time? That's awesome!

Too bad that my T530 apparently has no dbc/debug port... -_-

DemiMarie commented 1 year ago

So if I understand share=yes correctly, the device can even be assigned to sys-usb at the same time? That's awesome!

It can be, but doing so allows sys-usb to take control of the system.

3hhh commented 1 year ago

On 10/31/22 09:07, Demi Marie Obenour wrote:

So if I understand share=yes correctly, the device can even be assigned to sys-usb at the same time? That's awesome!

It can be, but doing so allows sys-usb to take control of the system.

Ok, so a dedicated dom0 USB device/port is more optimal after all...

3hhh commented 1 year ago

Btw is serial port debugging still supported and will it remain supported?

I'm asking because I don't see CONFIG_SERIAL_8250_CONSOLE here or anywhere else and those Express Card UART adapters for Thinkpads aren't too cheap to try either... (40-60 EUR)

marmarek commented 1 year ago

So if I understand share=yes correctly, the device can even be assigned to sys-usb at the same time? That's awesome!

As Demi noted, this has some risks associated. If you want, you can use share=no, in which case only Xen will have access to the controller (and sys-usb will likely fail to start). Or you can use share=hwdom (or no share option at all), in which case dom0 will have access to the controller and not sys-usb (but if you use sys-usb, then dom0 avoids touching the controller anyway).

Btw is serial port debugging still supported and will it remain supported?

Yes. But it's increasingly hard to have an actual serial port in a laptop.

I'm asking because I don't see CONFIG_SERIAL_8250_CONSOLE here or anywhere else and those Express Card UART adapters for Thinkpads aren't too cheap to try either... (40-60 EUR)

I think you're confusing Linux config with Xen config.

3hhh commented 1 year ago

I got dom0 kernel logs via UART running by adding GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX console=ttyUSB0,115200" to /etc/default/grub.

Do these include the Xen logs?

The Xen instructions didn't work for me (I tried GRUB_CMDLINE_XEN_DEFAULT="$GRUB_CMDLINE_XEN_DEFAULT console=com1 com1=115200,8n1 loglvl=all guest_loglvl=all"). I fear it's because the PCI Express Card apparently is a USB hub which just runs the RS232 port as USB client which Xen doesn't support, but the Linux kernel does. Of course such details aren't documented by the Express Card manufacturer...

brendanhoar commented 1 year ago

Yeah the pci card/expresscard needs to be a real UART serial port, not a usb port + usb serial device.

Even then I think @marmarek ran into some issues getting Xen to use a real one, IIRC.

B