ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

arm64 KVM problem when SCS is enabled #1096

Closed nathanchance closed 4 years ago

nathanchance commented 4 years ago

Thanks to the hard work of upstream developers, the Raspberry Pi 4 can be easily booted on mainline, which is rather neat since I now have an actual piece of hardware that I can use to run mainline kernels on :)

One of the things I wanted to try was spawning a guest with KVM with a clang built kernel, as we have received a report of it not working when BTI was enabled: https://lore.kernel.org/linux-arm-kernel/20200615105524.GA2694@willie-the-truck/

It works fine when just building defconfig (which is how I verified https://github.com/ClangBuiltLinux/boot-utils/pull/23):

$ src/boot-utils/boot-qemu.sh -a arm64 -k src/linux/out/arm64
...
+ timeout --foreground 3m unbuffer qemu-system-aarch64 -enable-kvm -cpu host -machine virt -append 'console=ttyAMA0 ' -display none -initrd /home/pi/src/boot-utils/images/arm64/rootfs.cpio -kernel /home/pi/src/linux/out/arm64/arch/arm64/boot/Image.gz -m 512m -nodefaults -serial mon:stdio
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] Linux version 5.8.0-rc5-00048-gf8456690ba8e (pi@raspberrypi) (clang version 12.0.0 (https://github.com/llvm/llvm-project 30c382a7c6607a7d898730f8d288768110cdf1d2), LLD 12.0.0 (https://github.com/llvm/llvm-project 30c382a7c6607a7d898730f8d288768110cdf1d2)) #1 SMP PREEMPT Wed Jul 15 22:15:55 MST 2020
[    0.000000] Machine model: linux,dummy-virt
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 32 MiB at 0x000000005e000000
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x5def7100-0x5def8fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.1
[    0.000000] percpu: Embedded 23 pages/cpu s53912 r8192 d32104 u94208
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: EL2 vector hardening
[    0.000000] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[    0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 129024
[    0.000000] Policy zone: DMA
[    0.000000] Kernel command line: console=ttyAMA0 
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 443264K/524288K available (13756K kernel code, 2188K rwdata, 7308K rodata, 1600K init, 484K bss, 48256K reserved, 32768K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[    0.000000]  Trampoline variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
[    0.000000] random: get_random_bytes called from start_kernel+0x1c8/0x384 with crng_init=0
[    0.000000] arch_timer: cp15 timer(s) running at 54.00MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
[    0.000003] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns
[    0.000085] Console: colour dummy device 80x25
[    0.000139] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=216000)
[    0.000145] pid_max: default: 32768 minimum: 301
[    0.000184] LSM: Security Framework initializing
[    0.000221] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
[    0.000232] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
[    0.001250] rcu: Hierarchical SRCU implementation.
[    0.001638] EFI services will not be available.
[    0.001697] smp: Bringing up secondary CPUs ...
[    0.001702] smp: Brought up 1 node, 1 CPU
[    0.001705] SMP: Total of 1 processors activated.
[    0.001713] CPU features: detected: 32-bit EL0 Support
[    0.001718] CPU features: detected: CRC32 instructions
[    0.001724] CPU features: detected: 32-bit EL1 Support
[    0.013238] CPU: All CPU(s) started at EL1
[    0.013267] alternatives: patching kernel code
[    0.014198] devtmpfs: initialized
[    0.015106] KASLR disabled due to lack of seed
[    0.015329] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.015339] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[    0.015961] pinctrl core: initialized pinctrl subsystem
[    0.016611] thermal_sys: Registered thermal governor 'step_wise'
[    0.016613] thermal_sys: Registered thermal governor 'power_allocator'
[    0.016682] DMI not present or invalid.
[    0.017002] NET: Registered protocol family 16
[    0.019912] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.019996] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.020050] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.020088] audit: initializing netlink subsys (disabled)
[    0.020711] cpuidle: using governor menu
[    0.020795] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.020829] ASID allocator initialised with 65536 entries
[    0.021369] Serial: AMBA PL011 UART driver
[    0.024205] audit: type=2000 audit(0.020:1): state=initialized audit_enabled=0 res=1
[    0.027281] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, base_baud = 0) is a PL011 rev1
[    0.150693] printk: console [ttyAMA0] enabled
[    0.152523] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.154174] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
[    0.155763] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.157514] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
[    0.159836] cryptd: max_cpu_qlen set to 1000
[    0.164319] ACPI: Interpreter disabled.
[    0.166166] iommu: Default domain type: Translated 
[    0.167526] vgaarb: loaded
[    0.168386] SCSI subsystem initialized
[    0.170162] usbcore: registered new interface driver usbfs
[    0.171563] usbcore: registered new interface driver hub
[    0.172885] usbcore: registered new device driver usb
[    0.174542] pps_core: LinuxPPS API ver. 1 registered
[    0.175747] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.177912] PTP clock support registered
[    0.179051] EDAC MC: Ver: 3.0.0
[    0.180512] FPGA manager framework
[    0.181427] Advanced Linux Sound Architecture Driver Initialized.
[    0.183407] clocksource: Switched to clocksource arch_sys_counter
[    0.185031] VFS: Disk quotas dquot_6.6.0
[    0.186041] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.187924] pnp: PnP ACPI: disabled
[    0.191669] NET: Registered protocol family 2
[    0.193179] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.195275] TCP established hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.197396] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.199166] TCP: Hash tables configured (established 4096 bind 4096)
[    0.200930] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.202516] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.204339] NET: Registered protocol family 1
[    0.205685] RPC: Registered named UNIX socket transport module.
[    0.207127] RPC: Registered udp transport module.
[    0.208363] RPC: Registered tcp transport module.
[    0.209486] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.211030] PCI: CLS 0 bytes, default 64
[    0.212136] Unpacking initramfs...
[    0.231281] Freeing initrd memory: 3448K
[    0.246266] hw perfevents: enabled with armv8_pmuv3 PMU driver, 7 counters available
[    0.248349] kvm [1]: HYP mode not available
[    0.250401] Initialise system trusted keyrings
[    0.251703] workingset: timestamp_bits=44 max_order=17 bucket_order=0
[    0.256595] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.260710] NFS: Registering the id_resolver key type
[    0.262053] Key type id_resolver registered
[    0.263078] Key type id_legacy registered
[    0.264167] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.265901] 9p: Installing v9fs 9p2000 file system support
[    0.291619] Key type asymmetric registered
[    0.292640] Asymmetric key parser 'x509' registered
[    0.293852] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
[    0.295680] io scheduler mq-deadline registered
[    0.296778] io scheduler kyber registered
[    0.301747] pl061_gpio 9030000.pl061: PL061 GPIO chip registered
[    0.304029] pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
[    0.305838] pci-host-generic 4010000000.pcie:       IO 0x003eff0000..0x003effffff -> 0x0000000000
[    0.308075] pci-host-generic 4010000000.pcie:      MEM 0x0010000000..0x003efeffff -> 0x0010000000
[    0.310214] pci-host-generic 4010000000.pcie:      MEM 0x8000000000..0xffffffffff -> 0x8000000000
[    0.312469] pci-host-generic 4010000000.pcie: ECAM at [mem 0x4010000000-0x401fffffff] for [bus 00-ff]
[    0.314760] pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
[    0.316568] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.317889] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.319385] pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
[    0.321194] pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
[    0.322968] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
[    0.326279] EINJ: ACPI disabled.
[    0.334756] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.337860] SuperH (H)SCI(F) driver initialized
[    0.339345] msm_serial: driver initialized
[    0.341133] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    0.349370] loop: module loaded
[    0.350875] megasas: 07.714.04.00-rc1
[    0.352903] physmap-flash 0.flash: physmap platform flash device: [mem 0x00000000-0x03ffffff]
[    0.367656] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
[    0.370007] Intel/Sharp Extended Query Table at 0x0031
[    0.375502] Using buffer write method
[    0.376553] physmap-flash 0.flash: physmap platform flash device: [mem 0x04000000-0x07ffffff]
[    0.386353] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
[    0.388812] Intel/Sharp Extended Query Table at 0x0031
[    0.395426] Using buffer write method
[    0.396900] Concatenating MTD devices:
[    0.397828] (0): "0.flash"
[    0.398526] (1): "0.flash"
[    0.399190] into device "0.flash"
[    0.406102] libphy: Fixed MDIO Bus: probed
[    0.407769] tun: Universal TUN/TAP device driver, 1.6
[    0.409569] thunder_xcv, ver 1.0
[    0.410459] thunder_bgx, ver 1.0
[    0.411272] nicpf, ver 1.0
[    0.412661] hclge is initializing
[    0.413537] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[    0.415290] hns3: Copyright (c) 2017 Huawei Corporation.
[    0.416760] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    0.418468] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    0.419939] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    0.421353] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.422829] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k
[    0.424636] igb: Copyright (c) 2007-2014 Intel Corporation.
[    0.426000] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k
[    0.427940] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[    0.429590] sky2: driver version 1.30
[    0.431044] VFIO - User Level meta-driver version: 0.3
[    0.433371] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.435025] ehci-pci: EHCI PCI platform driver
[    0.436273] ehci-platform: EHCI generic platform driver
[    0.437653] ehci-orion: EHCI orion driver
[    0.438719] ehci-exynos: EHCI Exynos driver
[    0.439880] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    0.441403] ohci-pci: OHCI PCI platform driver
[    0.442540] ohci-platform: OHCI generic platform driver
[    0.443948] ohci-exynos: OHCI Exynos driver
[    0.445229] usbcore: registered new interface driver usb-storage
[    0.448317] rtc-pl031 9010000.pl031: registered as rtc0
[    0.449655] rtc-pl031 9010000.pl031: setting system clock to 2020-07-17T04:54:20 UTC (1594961660)
[    0.452277] i2c /dev entries driver
[    0.455814] sdhci: Secure Digital Host Controller Interface driver
[    0.457342] sdhci: Copyright(c) Pierre Ossman
[    0.458658] Synopsys Designware Multimedia Card Interface Driver
[    0.460685] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.462812] ledtrig-cpu: registered to indicate activity on CPUs
[    0.465151] usbcore: registered new interface driver usbhid
[    0.466553] usbhid: USB HID core driver
[    0.469978] NET: Registered protocol family 17
[    0.471346] 9pnet: Installing 9P2000 support
[    0.472602] Key type dns_resolver registered
[    0.473856] registered taskstats version 1
[    0.474896] Loading compiled-in X.509 certificates
[    0.476812] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[    0.480346] ALSA device list:
[    0.481195]   No soundcards found.
[    0.482276] uart-pl011 9000000.pl011: no DMA platform data
[    0.485104] Freeing unused kernel memory: 1600K
[    0.487532] Run /init as init process
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Saving random seed: [    0.541978] random: dd: uninitialized urandom read (512 bytes read)
OK
Starting network: OK
Linux version 5.8.0-rc5-00048-gf8456690ba8e (pi@raspberrypi) (clang version 12.0.0 (https://github.com/llvm/llvm-project 30c382a7c6607a7d898730f8d288768110cdf1d2), LLD 12.0.0 (https://github.com/llvm/llvm-project 30c382a7c6607a7d898730f8d288768110cdf1d2)) #1 SMP PREEMPT Wed Jul 15 22:15:55 MST 2020
Linux version 5.8.0-rc5-00048-gf8456690ba8e (pi@raspberrypi) (clang version 12.0.0 (https://github.com/llvm/llvm-project 30c382a7c6607a7d898730f8d288768110cdf1d2), LLD 12.0.0 (https://github.com/llvm/llvm-project 30c382a7c6607a7d898730f8d288768110cdf1d2)) #1 SMP PREEMPT Wed Jul 15 22:15:55 MST 2020
Stopping network: OK
Saving random seed: [    0.606922] random: dd: uninitialized urandom read (512 bytes read)
OK
Stopping klogd: OK
Stopping syslogd: OK
umount: devtmpfs busy - remounted read-only
umount: can't unmount /: Invalid argument
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
[    2.635581] Flash device refused suspend due to active operation (state 20)
[    2.637358] Flash device refused suspend due to active operation (state 20)
[    2.639185] reboot: Power down
+ RET=0
+ set +x

However, as soon as I enable CONFIG_SHADOW_CALL_STACK, attempting to spawn a KVM guest kills the machine; I see the qemu-system-aarch64 but no other output then my mosh session disconnects and the green light on the Pi stops flashing. I am unsure of how to get a previous kernel log on "regular" Linux (I know that Android has pstore) so I am not sure how to further debug this.

I am going to do some research to see if this is a clang issue or more rooted in the kernel. Attempting to bisect probably won't prove fruitful for two reasons: SCS was only merged in 5.8-rc1 and Raspberry Pi 4 support has only been good for the past couple of kernel versions.

cc @samitolvanen

samitolvanen commented 4 years ago

Sounds like there are functions used in EL2 that are missing the __noscs attribute. Does the device have a serial console where you could see the kernel panic?

nathanchance commented 4 years ago

Sounds like there are functions used in EL2 that are missing the __noscs attribute.

Yes, that sounds about right. I reverted 9654736891c3ac6a60b52ce70d33cf57cf95bff7 and replaced it with the v6 version then everything works fine so it seems like some function that runs at EL2 is missing __hyp_text. Funny enough, Marc Zyngier and @willdeacon had a side conversation about this in v7: https://lore.kernel.org/lkml/fed83df0e9140b9655b00f315814fab8@kernel.org/

Does the device have a serial console where you could see the kernel panic?

Yes, it does. I just need to get a serial to USB cable.

nathanchance commented 4 years ago

Well the serial debugging cable I got does not appear to work or I am holding it wrong but I did some good old "disable it for this translation unit" debugging in the meantime and came down to switch.c being the problematic file.

Every thing is fine with this diff:

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 2ca7ba69c318..b131b08cd63a 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -999,3 +999,4 @@ CONFIG_DEBUG_KERNEL=y
 # CONFIG_DEBUG_PREEMPT is not set
 # CONFIG_FTRACE is not set
 CONFIG_MEMTEST=y
+CONFIG_SHADOW_CALL_STACK=y
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 8c9880783839..d3acd087fb07 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -11,6 +11,8 @@ obj-$(CONFIG_KVM) += hyp.o
 hyp-y := vgic-v3-sr.o timer-sr.o aarch32.o vgic-v2-cpuif-proxy.o sysreg-sr.o \
         debug-sr.o entry.o switch.o fpsimd.o tlb.o hyp-entry.o

+CFLAGS_REMOVE_switch.o := $(CC_FLAGS_SCS)
+
 # KVM code is run at a different exception code with a different map, so
 # compiler instrumentation that inserts callbacks or checks into the code may
 # cause crashes. Just disable it.

Unfortunately, I tried adding __noscs to all of the functions in that file that did not already have __hyp_text and it did not solve the issue so I assume something else is at play. I will probably have more time to look into this further tomorrow but if anyone sees anything obvious or has any other ideas, I am all ears :)

nickdesaulniers commented 4 years ago

For the serial debug cable, assuming you have it plugged into the right pins (and IIRC on the Pi there's a step you need to do to enable serial debug, test on a working kernel), then the host should see a new /dev/ttyUSB. I use screen to connect, and the baud rate (could be different). screen /dev/ttyUSB0 115200. Make sure you're testing on a known good kernel first, so that you're not conflating a broken kernel with an unrelated issue in setting up serial debugging.

nathanchance commented 4 years ago

Yeah you have to add enable_uart=1 in /boot/config.txt. Unfortunately, I am not on Linux for the host, I am on Windows, which complicates things. I installed the drivers as the guide says that I should and I can see the COM port in Device Manager but I never get connected to the Pi, even with the stock kernel. Makes me wonder if the cable is just bad or I have done something wrong.

nickdesaulniers commented 4 years ago

Can you boot your host off a USB live image of linux?

nathanchance commented 4 years ago

That's a good idea, I will try that soon.

nathanchance commented 4 years ago

Unfortunately, same deal even with live Linux; I see /dev/ttyUSB0 but I get no output when I have everything connect properly. Makes me wonder if it is the cable or some configuration issue with the Pi. I have the beta 64-bit build of Raspbian installed, which could be playing into it. I might go get another microSD card so that I can get a completely stock OS build loaded up onto it and see if I can test with that.

nickdesaulniers commented 4 years ago

The section on "UARTs and Device Tree" https://www.raspberrypi.org/documentation/configuration/uart.md makes it sound like bluetooth might have to be disabled.

nickdesaulniers commented 4 years ago

See also: https://raspberrypi.stackexchange.com/questions/108769/what-is-the-correct-way-to-connect-serial-console-on-rpi4-model-b https://www.abelectronics.co.uk/kb/article/1035/raspberry-pi-3--4-and-zero-w-serial-port-usage

nathanchance commented 4 years ago

Ah that is it!

image

Unfortunately, it does not seem like dtoverlay=bt-disable works with mainline but I can use my 5.4 branch since it has the same issue and it looks like there might be a 5.7 branch that I can use to jump forward if need be. Thanks for all the help!

nathanchance commented 4 years ago

So I have dmesg --follow open through the serial console and I know that it works from running echo hi | sudo tee -a /dev/ksmg. However, when I run boot-qemu.sh, mosh dies like before but I see no additional dmesg output from the serial console. It looks like the whole machine locks up. Am I doing something wrong and/or should I be doing something different?

nickdesaulniers commented 4 years ago

Can you see if

$ llvm-readelf -s arch/arm64/kvm/hyp/switch.o | grep FUNC

is different with and without CONFIG_SHADOW_CALL_STACK? If so, can you post the difference?

samitolvanen commented 4 years ago

You can also run llvm-objdump on switch.o and see which functions touch x18.

nickdesaulniers commented 4 years ago

comparing w/ and w/o SCS

$ llvm-readelf -s arch/arm64/kvm/hyp/switch.o | grep UND

may also be of interest.

nathanchance commented 4 years ago
$ diff <(llvm-readelf -s out/arm64-no-scs/arch/arm64/kvm/hyp/switch.o |& cut -d : -f 2 | grep FUNC) <(llvm-readelf -s out/arm64-scs/arch/arm64/kvm/hyp/switch.o |& cut -d : -f 2 | grep FUNC) 
1,13c1,38
<  0000000000000000   328 FUNC    LOCAL  DEFAULT    19 __activate_traps
<  0000000000000490   148 FUNC    LOCAL  DEFAULT    19 __deactivate_traps
<  000000000000084c   252 FUNC    LOCAL  DEFAULT    19 __hyp_call_panic_nvhe
<  0000000000000460   104 FUNC    LOCAL  DEFAULT     2 __hyp_call_panic_vhe
<  0000000000000948   492 FUNC    LOCAL  DEFAULT    19 __hyp_handle_fpsimd
<  0000000000000000   472 FUNC    LOCAL  DEFAULT     2 activate_traps_vhe
<  00000000000001d8    48 FUNC    LOCAL  DEFAULT     2 deactivate_traps_vhe
<  0000000000000148   840 FUNC    LOCAL  DEFAULT    19 fixup_guest_exit
<  0000000000000524   756 FUNC    GLOBAL DEFAULT    19 __kvm_vcpu_run_nvhe
<  0000000000000208    44 FUNC    GLOBAL DEFAULT     2 activate_traps_vhe_load
<  0000000000000234    40 FUNC    GLOBAL DEFAULT     2 deactivate_traps_vhe_put
<  0000000000000818    52 FUNC    GLOBAL DEFAULT    19 hyp_panic
<  000000000000025c   516 FUNC    GLOBAL DEFAULT     2 kvm_vcpu_run_vhe
---
>  000000000000007c   276 FUNC    LOCAL  DEFAULT    15 __activate_traps
>  0000000000000000    36 FUNC    LOCAL  DEFAULT    15 __activate_traps_common
>  0000000000000b50    44 FUNC    LOCAL  DEFAULT    15 __activate_traps_fpsimd32
>  0000000000000034    72 FUNC    LOCAL  DEFAULT    15 __activate_vm
>  00000000000006a4   120 FUNC    LOCAL  DEFAULT    15 __deactivate_traps
>  0000000000000024    16 FUNC    LOCAL  DEFAULT    15 __deactivate_traps_common
>  000000000000071c    44 FUNC    LOCAL  DEFAULT    15 __fpsimd_save_fpexc32
>  0000000000000a38   208 FUNC    LOCAL  DEFAULT    15 __hyp_call_panic_nvhe
>  0000000000000304   108 FUNC    LOCAL  DEFAULT     2 __hyp_call_panic_vhe
>  00000000000002ec    24 FUNC    LOCAL  DEFAULT     2 __kern_hyp_va
>  0000000000000190    68 FUNC    LOCAL  DEFAULT    15 __set_guest_arch_workaround_state
>  0000000000000660    68 FUNC    LOCAL  DEFAULT    15 __set_host_arch_workaround_state
>  0000000000000000   408 FUNC    LOCAL  DEFAULT     2 activate_traps_vhe
>  0000000000000198    48 FUNC    LOCAL  DEFAULT     2 deactivate_traps_vhe
>  00000000000001d4  1164 FUNC    LOCAL  DEFAULT    15 fixup_guest_exit
>  0000000000000370    40 FUNC    LOCAL  DEFAULT     2 has_vhe
>  0000000000000494    84 FUNC    LOCAL  DEFAULT     2 kvm_skip_instr
>  0000000000000420    44 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_dabt_isextabt
>  000000000000044c    12 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_dabt_iss1tw
>  0000000000000414    12 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_dabt_isvalid
>  0000000000000458     8 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_get_hsr
>  0000000000000460    12 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_sys_get_rt
>  00000000000003fc    12 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_trap_get_class
>  0000000000000408    12 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_trap_get_fault_type
>  00000000000004e8    12 FUNC    LOCAL  DEFAULT     2 kvm_vcpu_trap_il_is32bit
>  000000000000051c    28 FUNC    LOCAL  DEFAULT     2 sve_ffr_offset
>  00000000000004f4    40 FUNC    LOCAL  DEFAULT     2 sve_pffr
>  00000000000003c0    44 FUNC    LOCAL  DEFAULT     2 system_supports_fpsimd
>  0000000000000398    40 FUNC    LOCAL  DEFAULT     2 system_supports_sve
>  0000000000000b08    72 FUNC    LOCAL  DEFAULT    15 update_fp_enabled
>  00000000000003ec    16 FUNC    LOCAL  DEFAULT     2 vcpu_el1_is_32bit
>  000000000000046c    32 FUNC    LOCAL  DEFAULT     2 vcpu_get_reg
>  000000000000048c     8 FUNC    LOCAL  DEFAULT     2 vcpu_pc
>  0000000000000748   672 FUNC    GLOBAL DEFAULT    15 __kvm_vcpu_run_nvhe
>  00000000000001c8    36 FUNC    GLOBAL DEFAULT     2 activate_traps_vhe_load
>  00000000000001ec    52 FUNC    GLOBAL DEFAULT     2 deactivate_traps_vhe_put
>  00000000000009e8    80 FUNC    GLOBAL DEFAULT    15 hyp_panic
>  0000000000000220   204 FUNC    GLOBAL DEFAULT     2 kvm_vcpu_run_vhe

$ diff <(llvm-readelf -s out/arm64-no-scs/arch/arm64/kvm/hyp/switch.o |& cut -d : -f 2 | grep UND) <(llvm-readelf -s out/arm64-scs/arch/arm64/kvm/hyp/switch.o |& cut -d : -f 2 | grep UND)

$

switch.o-scs-disassembly.txt

I see activate_traps_vhe, activate_traps_vhe_load, deactivate_traps_vhe_put, kvm_vcpu_run_vhe, __hyp_call_panic_vhe, and kvm_skip_instr.

willdeacon commented 4 years ago

kvm_skip_instr() is __always_inline yet appears to have been out-of-lined, which could explain the issue here.

nickdesaulniers commented 4 years ago
$ rm -f arch/arm64/kvm/hyp/switch.o
$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 -j71 arch/arm64/kvm/hyp/switch.o KCFLAGS="-Rpass=inline" 2>&1 | grep kvm_skip_instr

might be able to tell you more info about the decision to inline or not. I would check -Rpass-missed=inline first w/ grep kvm_skip_instr. If there's no output, then I'd check -Rpass=inline w/ grep kvm_skip_instr which will have more info about child functions being inlined in kvm_skip_instr as well.

I don't see kvm_skip_instr defined without SCS, so I'm guessing that SCS is doing something that is upsetting to the inliner and likely needs to be fixed.

nathanchance commented 4 years ago

kvm_skip_instr() is __always_inline yet appears to have been out-of-lined, which could explain the issue here.

Ugh, I am sorry, that's my fault. I missed 5c37f1ae1c335800d16b207cb578009c695dcd39 in my backport to 5.4.

However, picking that does not solve anything, which lines up with the initial report, which was arm64 defconfig + CONFIG_SHADOW_CALL_STACK on mainline (I did not want to open this issue until I could reproduce on mainline to ensure it wasn't a problem with my backport). I only switched back to 5.4 to attempt to get a serial connection and debug it that way.

Looks like there is now a 5.8 branch with all of the out of tree dtb stuff that allows me to disable Bluetooth easily and reclaim the primary UART for the serial console. I did manage to get the serial console "working" with a pure upstream kernel but it uses the mini UART, which I could not get to output anything other than garbage:

image

I will probably reach out on the Raspberry Pi kernel mailing list to see if I can get some help with that.

I did manage to get the panic information via serial console this time around:

[   93.052617] Kernel panic - not syncing: HYP panic:
[   93.052617] PS:200003c9 PC:0000006900268b30 ESR:86000006
[   93.052617] FAR:0000006900268b30 HPFAR:0000000000000680 PAR:1d000c7edbadc0de
[   93.052617] VCPU:00000000c544c0ad
[   93.073355] SMP: stopping secondary CPUs
[   93.077332] Kernel Offset: disabled
[   93.080864] CPU features: 0x240022,20006000
[   93.085101] Memory Limit: none
[   93.088198] ---[ end Kernel panic - not syncing: HYP panic:
[   93.088198] PS:200003c9 PC:0000006900268b30 ESR:86000006
[   93.088198] FAR:0000006900268b30 HPFAR:0000000000000680 PAR:1d000c7edbadc0de
[   93.088198] VCPU:00000000c544c0ad ]---

Not super descriptive... but better than nothing I suppose.

Here is that same information that Nick requested on mainline:

$ diff -u \
<(llvm-readelf -s out/arm64-no-scs/arch/arm64/kvm/hyp/switch.o | grep FUNC) \
<(llvm-readelf -s out/arm64-scs/arch/arm64/kvm/hyp/switch.o | grep FUNC)
--- /proc/self/fd/12    2020-07-21 14:05:37.959350898 -0700
+++ /proc/self/fd/14    2020-07-21 14:05:37.963351018 -0700
@@ -1,14 +1,27 @@
-     8: 0000000000000080   528 FUNC    LOCAL  DEFAULT     1 __kvm_vcpu_run_vhe
-    12: 00000000000007f0   260 FUNC    LOCAL  DEFAULT    13 __activate_traps_nvhe
-    13: 0000000000000304   864 FUNC    LOCAL  DEFAULT    13 fixup_guest_exit
-    14: 0000000000000664   220 FUNC    LOCAL  DEFAULT    13 __deactivate_traps
-    18: 00000000000002f4   492 FUNC    LOCAL  DEFAULT     1 activate_traps_vhe
-    21: 00000000000008f4   508 FUNC    LOCAL  DEFAULT    13 __hyp_handle_fpsimd
-    22: 0000000000000af0   336 FUNC    LOCAL  DEFAULT    13 __hyp_handle_ptrauth
-    23: 0000000000000778   120 FUNC    LOCAL  DEFAULT    13 __hyp_call_panic_nvhe
-    24: 0000000000000290   100 FUNC    LOCAL  DEFAULT     1 __hyp_call_panic_vhe
-    56: 0000000000000000    44 FUNC    GLOBAL DEFAULT     1 activate_traps_vhe_load
-    57: 000000000000002c    40 FUNC    GLOBAL DEFAULT     1 deactivate_traps_vhe_put
-    58: 0000000000000054    44 FUNC    GLOBAL DEFAULT     1 kvm_vcpu_run_vhe
-    71: 0000000000000000   772 FUNC    GLOBAL DEFAULT    13 __kvm_vcpu_run_nvhe
-    90: 0000000000000740    56 FUNC    GLOBAL DEFAULT    13 hyp_panic
+     8: 0000000000000000    44 FUNC    LOCAL  DEFAULT     6 __activate_traps_common
+    11: 000000000000002c    24 FUNC    LOCAL  DEFAULT     6 __deactivate_traps_common
+    12: 0000000000000348    88 FUNC    LOCAL  DEFAULT     6 __activate_vm
+    13: 00000000000003a0   152 FUNC    LOCAL  DEFAULT     6 __activate_traps
+    14: 0000000000000438    80 FUNC    LOCAL  DEFAULT     6 __set_guest_arch_workaround_state
+    15: 0000000000000488  1120 FUNC    LOCAL  DEFAULT     6 fixup_guest_exit
+    16: 00000000000008e8    80 FUNC    LOCAL  DEFAULT     6 __set_host_arch_workaround_state
+    17: 0000000000000938   196 FUNC    LOCAL  DEFAULT     6 __deactivate_traps
+    18: 00000000000009fc    28 FUNC    LOCAL  DEFAULT     6 __fpsimd_save_fpexc32
+    22: 0000000000000ac8   260 FUNC    LOCAL  DEFAULT     6 __activate_traps_nvhe
+    26: 00000000000001a8   412 FUNC    LOCAL  DEFAULT     1 activate_traps_vhe
+    27: 0000000000000c58   472 FUNC    LOCAL  DEFAULT     6 __hyp_handle_fpsimd
+    28: 0000000000000398    64 FUNC    LOCAL  DEFAULT     1 system_supports_address_auth
+    29: 00000000000003d8    64 FUNC    LOCAL  DEFAULT     1 system_supports_generic_auth
+    30: 0000000000000418    24 FUNC    LOCAL  DEFAULT     1 vcpu_ptrauth_enable
+    31: 0000000000000430    48 FUNC    LOCAL  DEFAULT     1 deactivate_traps_vhe
+    32: 0000000000000a50   120 FUNC    LOCAL  DEFAULT     6 __hyp_call_panic_nvhe
+    33: 0000000000000140   104 FUNC    LOCAL  DEFAULT     1 __hyp_call_panic_vhe
+    35: 0000000000000bcc    76 FUNC    LOCAL  DEFAULT     6 update_fp_enabled
+    36: 0000000000000c18    64 FUNC    LOCAL  DEFAULT     6 __activate_traps_fpsimd32
+    37: 0000000000000344    48 FUNC    LOCAL  DEFAULT     1 sve_pffr
+    38: 0000000000000374    36 FUNC    LOCAL  DEFAULT     1 sve_ffr_offset
+    69: 0000000000000000    36 FUNC    GLOBAL DEFAULT     1 activate_traps_vhe_load
+    70: 0000000000000024    52 FUNC    GLOBAL DEFAULT     1 deactivate_traps_vhe_put
+    71: 0000000000000058   232 FUNC    GLOBAL DEFAULT     1 kvm_vcpu_run_vhe
+    80: 0000000000000044   772 FUNC    GLOBAL DEFAULT     6 __kvm_vcpu_run_nvhe
+   102: 0000000000000a18    56 FUNC    GLOBAL DEFAULT     6 hyp_panic

$diff -u \
<(${CBL_LLVM}/llvm-readelf -s out/arm64-no-scs/arch/arm64/kvm/hyp/switch.o | grep UND) \
<(${CBL_LLVM}/llvm-readelf -s out/arm64-scs/arch/arm64/kvm/hyp/switch.o | grep UND)
--- /proc/self/fd/12    2020-07-21 14:06:25.776789179 -0700
+++ /proc/self/fd/14    2020-07-21 14:06:25.776789179 -0700
@@ -1,45 +1,45 @@
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
-    59: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_host_data
-    60: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_save_host_state_vhe
-    61: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND arm64_const_caps_ready
-    62: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND cpu_hwcap_keys
-    63: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_restore_guest_state_vhe
-    64: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __debug_switch_to_guest
-    65: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __guest_enter
-    66: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_save_guest_state_vhe
-    67: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_restore_host_state_vhe
-    68: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __debug_switch_to_host
-    69: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND cpu_hwcaps
-    70: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND arm64_ssbd_callback_required
-    72: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_update_va_mask
-    73: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg_save_state_nvhe
-    74: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg32_restore_state
-    75: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg_restore_state_nvhe
-    76: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_vgic_global_state
-    77: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __timer_enable_traps
-    78: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg32_save_state
-    79: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __timer_disable_traps
-    80: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_activate_traps
-    81: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_restore_state
-    82: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_save_state
-    83: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_deactivate_traps
-    84: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND vgic_v2_cpuif_trap
-    85: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND vgic_v3_cpuif_trap
-    86: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v2_perform_cpuif_access
-    87: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_perform_cpuif_access
-    88: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_skip_instr32
-    89: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND vectors
-    91: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __hyp_do_panic
-    92: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND panic
-    93: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kimage_voffset
-    94: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND bp_hardening_data
-    95: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND physvirt_offset
-    96: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __kvm_hyp_vector
-    97: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND this_cpu_has_cap
-    98: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __kvm_harden_el2_vector_slot
-    99: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __kvm_bp_vect_base
-   100: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __bp_harden_hyp_vecs
-   101: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sve_save_state
-   102: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __fpsimd_save_state
-   103: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sve_load_state
-   104: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __fpsimd_restore_state
+    72: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_host_data
+    73: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_save_host_state_vhe
+    74: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_restore_guest_state_vhe
+    75: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __debug_switch_to_guest
+    76: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __guest_enter
+    77: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_save_guest_state_vhe
+    78: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sysreg_restore_host_state_vhe
+    79: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __debug_switch_to_host
+    81: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_update_va_mask
+    82: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg_save_state_nvhe
+    83: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg32_restore_state
+    84: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg_restore_state_nvhe
+    85: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND arm64_const_caps_ready
+    86: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND cpu_hwcap_keys
+    87: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_vgic_global_state
+    88: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __timer_enable_traps
+    89: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __sysreg32_save_state
+    90: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __timer_disable_traps
+    91: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND cpu_hwcaps
+    92: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_activate_traps
+    93: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_restore_state
+    94: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND arm64_ssbd_callback_required
+    95: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_save_state
+    96: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_deactivate_traps
+    97: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND vgic_v2_cpuif_trap
+    98: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND vgic_v3_cpuif_trap
+    99: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v2_perform_cpuif_access
+   100: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __vgic_v3_perform_cpuif_access
+   101: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kvm_skip_instr32
+   103: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __hyp_do_panic
+   104: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND panic
+   105: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND kimage_voffset
+   106: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND bp_hardening_data
+   107: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND physvirt_offset
+   108: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __kvm_hyp_vector
+   109: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND this_cpu_has_cap
+   110: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __kvm_harden_el2_vector_slot
+   111: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __kvm_bp_vect_base
+   112: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __bp_harden_hyp_vecs
+   113: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sve_save_state
+   114: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __fpsimd_save_state
+   115: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND sve_load_state
+   116: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND __fpsimd_restore_state
+   117: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND vectors

Here is the content of aarch64-linux-gnu-objdump -dr arch/arm64/kvm/hyp/switch.o on mainline...

Without SCS: https://gist.github.com/a930e624d11ba94fd8f4f5f24542fd67

With SCS: https://gist.github.com/ea63c1dffd584618487442f4df970919

nickdesaulniers commented 4 years ago

which I could not get to output anything other than garbage:

That's pretty common when the baud rate of the client is wrong. The client starts interpreting signals at the wrong rate, and thus interprets an otherwise valid signal as garbage.

Comparing the list of defined symbols, I see:

--- /noscs.txt    2020-07-21 16:14:52.746914000 -0700
+++ /scs.txt    2020-07-21 16:14:52.746914000 -0700
@@ -1,14 +1,27 @@
+__activate_traps
+__activate_traps_common
+__activate_traps_fpsimd32
 __activate_traps_nvhe
 activate_traps_vhe
 activate_traps_vhe_load
+__activate_vm
 __deactivate_traps
+__deactivate_traps_common
+deactivate_traps_vhe
 deactivate_traps_vhe_put
 fixup_guest_exit
+__fpsimd_save_fpexc32
 __hyp_call_panic_nvhe
 __hyp_call_panic_vhe
 __hyp_handle_fpsimd
-__hyp_handle_ptrauth
 hyp_panic
 __kvm_vcpu_run_nvhe
-__kvm_vcpu_run_vhe
 kvm_vcpu_run_vhe
+__set_guest_arch_workaround_state
+__set_host_arch_workaround_state
+sve_ffr_offset
+sve_pffr
+system_supports_address_auth
+system_supports_generic_auth
+update_fp_enabled
+vcpu_ptrauth_enable

I wonder if all the +'s should be marked __hyp_text (I'm not sure __always_inline is the best solution here, unless these function are simultaneously callable from .hyp.text AND other sections, same thoughts when looking at 5c37f1ae1c335800d16b207cb578009c695dcd39; not that it's wrong, I would just take preference of __hyp_text over __always_inline when possible)? Just looking at vcpu_ptrauth_enable in mainline, it's just static inline and its lone callsite is from arch/arm64/kvm/hyp/switch.c (that definition can probably be moved into the .c file then).

willdeacon commented 4 years ago

Careful here -- anything that is "VHE only" (where the entire kernel runs with hypervisor privileges at EL2) doesn't need the __hyp_text annotation, and SCS will be used as normal. We've rewritten a bunch of how this works for 5.9, so maybe worth having a go with linux-next to see if it's any better. The __hyp_text annotation is gone, and we use a terrible invocation of objcopy to create the relevant sections now.

mzyngier commented 4 years ago

I think I nailed this one with https://lore.kernel.org/kvm/20200722162231.3689767-1-maz@kernel.org/ I'd be grateful if you could give it a go and report back on the list whether if fixes it for you.

nathanchance commented 4 years ago

Careful here -- anything that is "VHE only" (where the entire kernel runs with hypervisor privileges at EL2) doesn't need the __hyp_text annotation, and SCS will be used as normal. We've rewritten a bunch of how this works for 5.9, so maybe worth having a go with linux-next to see if it's any better. The __hyp_text annotation is gone, and we use a terrible invocation of objcopy to create the relevant sections now.

I built next-20200722 arm64 defconfig, booted it up, and ran QEMU with -enable-kvm without any issues so I guess that refactoring did do something :)

I think I nailed this one with https://lore.kernel.org/kvm/20200722162231.3689767-1-maz@kernel.org/ I'd be grateful if you could give it a go and report back on the list whether if fixes it for you.

I will try this with mainline later, thanks for the fix!

nathanchance commented 4 years ago

As I replied on the list, Marc's patch against mainline resolves the issue as well.

nathanchance commented 4 years ago

Marc's patch is now in the KVM tree: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?id=bf4086b1a1efa3d3a2c17582e00bbd2176dfe177

nathanchance commented 4 years ago

This made it into 5.8: https://git.kernel.org/linus/bf4086b1a1efa3d3a2c17582e00bbd2176dfe177