NVIDIA / nvtrust

Ancillary open source software to support confidential computing on NVIDIA GPUs
Apache License 2.0
208 stars 30 forks source link

Running out of DMA mapping when launching VM #46

Open iihihiuh opened 9 months ago

iihihiuh commented 9 months ago

When I am launching a CVM, I got the following error:

qemu-system-x86_64: -device vfio-pci,host=b0:00.0,bus=pci.1: warning: vfio_register_ram_discard_listener: possibly running out of DMA mappings. E.g., try increasing the 'block-size' of virtio-mem devies. Maximum possible DMA mappings: 65535, Maximum possible memslots: 32764

But I set the VM memory to 126GB. What configurations I need to change to fix this error?

Tan-YiFan commented 9 months ago

Can you share your scripts of launching the CVM here?

iihihiuh commented 9 months ago

Thanks for the quick reply. I am using the start-qemu.sh from the Intel tdx-tools repository as suggested by the CC developer guide.

#!/bin/bash
#
# Launch QEMU-KVM to create Guest VM in following types:
# - Legacy VM: non-TDX VM boot with legacy(non-EFI) SEABIOS
# - EFI VM: non-TDX VM boot with EFI BIOS OVMF(Open Virtual Machine Firmware)
# - TD VM: TDX VM boot with OVMF via qemu-kvm launch parameter "kvm-type=tdx,confidential-guest-support=tdx"
#
# Prerequisite:
# 1. Build and Install TDX stack. Please refer to README.md in build/<your_distro>
# 2. Create TDX guest image with
#   - TDX guest kernel
#   - (optional)Modified Grub and Shim for TDX measurement to RTMR
#
# Note:
#
# - This script support "direct" and "grub" boot:
#   * direct: pass kernel image via "-kernel" and kernel command line via
#             "cmdline" via qemu-kvm launch parameter.
#   * grub: do not pass kernel and cmdline but leverage EFI BDS boot
#           shim->grub->kernel within guest image
# - To get consistent TD_REPORT within guest cross power cycle, please keep
#   consistent configurations for TDX guest such as same MAC address.
#

CURR_DIR=$(readlink -f "$(dirname "$0")")

# Set distro related parameters according to distro
DISTRO=$(grep -w 'NAME' /etc/os-release)
if [[ "$DISTRO" =~ .*"Ubuntu".* ]]; then
    QEMU_EXEC="/usr/bin/qemu-system-x86_64"
    LEGACY_BIOS="/usr/share/seabios/bios.bin"
else
    QEMU_EXEC="/usr/libexec/qemu-kvm"
    LEGACY_BIOS="/usr/share/qemu-kvm/bios.bin"
fi

# VM configurations
CPUS=16
MEM=128G
SGX_EPC_SIZE=32M

# Installed from the package of intel-mvp-tdx-tdvf
OVMF="/usr/share/qemu/OVMF.fd"
GUEST_IMG=""
DEFAULT_GUEST_IMG="${CURR_DIR}/td-guest.qcow2"
KERNEL=""
DEFAULT_KERNEL="${CURR_DIR}/vmlinuz"
VM_TYPE="td"
BOOT_TYPE="direct"
DEBUG=false
USE_VSOCK=false
USE_SERIAL_CONSOLE=false
FORWARD_PORT=10026
MONITOR_PORT=9002
ROOT_PARTITION="/dev/vda1"
KERNEL_CMD_NON_TD="root=${ROOT_PARTITION} rw console=hvc0"
KERNEL_CMD_TD="${KERNEL_CMD_NON_TD}"
MAC_ADDR=""
QUOTE_TYPE=""

# Just log message of serial into file without input
HVC_CONSOLE="-chardev stdio,id=mux,mux=on,logfile=$CURR_DIR/vm_log_$(date +"%FT%H%M").log \
             -device virtio-serial,romfile= \
             -device virtconsole,chardev=mux -monitor chardev:mux \
             -serial chardev:mux -nographic \
             -no-hpet -nodefaults -device pcie-root-port,id=pci.1,bus=pcie.0 -device vfio-pci,host=b1:00.0,bus=pci.1 -fw_cfg name=opt/ovmf/X-PciMmio64,string=262144"

# In grub boot, serial consle need input to select grub menu instead of HVC
# Please make sure console=ttyS0 is added in grub.cfg since no virtconsole
#
SERIAL_CONSOLE="-serial stdio"

# Default template for QEMU command line
QEMU_CMD="${QEMU_EXEC} -accel kvm \
          -name process=tdxvm,debug-threads=on \
          -m $MEM -vga none \
          -monitor pty \
          -no-hpet -nodefaults"
PARAM_CPU=" -cpu host,-kvm-steal-time,pmu=off"
PARAM_MACHINE=" -machine q35"

usage() {
    cat << EOM
Usage: $(basename "$0") [OPTION]...
  -i <guest image file>     Default is td-guest.qcow2 under current directory
  -k <kernel file>          Default is vmlinuz under current directory
  -t [legacy|efi|td|sgx]    VM Type, default is "td"
  -b [direct|grub]          Boot type, default is "direct" which requires kernel binary specified via "-k"
  -p <Monitor port>         Monitor via telnet
  -f <SSH Forward port>     Host port for forwarding guest SSH
  -o <OVMF file>            BIOS firmware device file, for "td" and "efi" VM only
  -m <11:22:33:44:55:66>    MAC address, impact TDX measurement RTMR
  -q [tdvmcall|vsock]       Support for TD quote using tdvmcall or vsock
  -c <number>               Number of CPUs, default is 1
  -r <root partition>       root partition for direct boot, default is /dev/vda1
  -v                        Flag to enable vsock
  -d                        Flag to enable "debug=on" for GDB guest
  -s                        Flag to use serial console instead of HVC console
  -h                        Show this help
EOM
}

error() {
    echo -e "\e[1;31mERROR: $*\e[0;0m"
    exit 1
}

warn() {
    echo -e "\e[1;33mWARN: $*\e[0;0m"
}

process_args() {
    while getopts ":i:k:t:b:p:f:o:a:m:vdshq:c:r:" option; do
        case "$option" in
            i) GUEST_IMG=$OPTARG;;
            k) KERNEL=$OPTARG;;
            t) VM_TYPE=$OPTARG;;
            b) BOOT_TYPE=$OPTARG;;
            p) MONITOR_PORT=$OPTARG;;
            f) FORWARD_PORT=$OPTARG;;
            o) OVMF=$OPTARG;;
            m) MAC_ADDR=$OPTARG;;
            v) USE_VSOCK=true;;
            d) DEBUG=true;;
            s) USE_SERIAL_CONSOLE=true;;
            q) QUOTE_TYPE=$OPTARG;;
            c) CPUS=$OPTARG;;
            r) ROOT_PARTITION=$OPTARG;;
            h) usage
               exit 0
               ;;
            *)
               echo "Invalid option '-$OPTARG'"
               usage
               exit 1
               ;;
        esac
    done

    if [[ ! -f ${QEMU_EXEC} ]]; then
        error "Please install QEMU which supports TDX."
    fi

    # Validate the number of CPUs
    if ! [[ ${CPUS} =~ ^[0-9]+$ && ${CPUS} -gt 0 ]]; then
        error "Invalid number of CPUs: ${CPUS}"
    fi

    GUEST_IMG="${GUEST_IMG:-${DEFAULT_GUEST_IMG}}"
    if [[ ! -f ${GUEST_IMG} ]]; then
        usage
        error "Guest image file ${GUEST_IMG} not exist. Please specify via option \"-i\""
    fi

    # Create temparory firmware device file from OVMF.fd
    if [[ ${OVMF} == "/usr/share/qemu/OVMF.fd" ]]; then
        if [[ ! -f /usr/share/qemu/OVMF.fd ]]; then
            error "Could not find /usr/share/qemu/OVMF.fd. Please install TDVF(Trusted Domain Virtual Firmware)."
        fi
    fi

    # Check parameter MAC address
    if [[ -n ${MAC_ADDR} ]]; then
        if [[ ! ${MAC_ADDR} =~ ^([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}$ ]]; then
            error "Invalid MAC address: ${MAC_ADDR}"
        fi
    fi

    case ${GUEST_IMG##*.} in
        qcow2) FORMAT="qcow2";;
          img) FORMAT="raw";;
            *) echo "Unknown disk image's format"; exit 1 ;;
    esac

    # Guest rootfs changes
    if [[ ${ROOT_PARTITION} != "/dev/vda1" ]]; then
        KERNEL_CMD_NON_TD=${KERNEL_CMD_NON_TD//"/dev/vda1"/${ROOT_PARTITION}}
        KERNEL_CMD_TD="${KERNEL_CMD_NON_TD}"
    fi

    QEMU_CMD+=" -drive file=$(readlink -f "${GUEST_IMG}"),if=virtio,format=$FORMAT "
    QEMU_CMD+=" -monitor telnet:127.0.0.1:${MONITOR_PORT},server,nowait "

    if [[ ${DEBUG} == true ]]; then
        OVMF="/usr/share/qemu/OVMF.debug.fd"
    QEMU_CMD+=" -s -S "
    KERNEL_CMD_NON_TD+=" nokaslr"
    KERNEL_CMD_TD+=" nokaslr"
    fi

    if [[ -n ${QUOTE_TYPE} ]]; then
        case ${QUOTE_TYPE} in
            "tdvmcall") ;;
            "vsock")
                USE_VSOCK=true
                ;;
            *)
                error "Invalid quote type \"$QUOTE_TYPE\", must be [vsock|tdvmcall]"
                ;;
        esac
    fi

    case ${VM_TYPE} in
        "td")
            cpu_tsc=$(grep 'cpu MHz' /proc/cpuinfo | head -1 | awk -F: '{print $2/1024}')
            if (( $(echo "$cpu_tsc < 1" |bc -l) )); then
                PARAM_CPU+=",tsc-freq=1000000000"
            fi
            # Note: "pic=no" could only be used in TD mode but not for non-TD mode
            PARAM_MACHINE+=",kernel_irqchip=split,confidential-guest-support=tdx,memory-backend=ram1"
            QEMU_CMD+=" -bios ${OVMF}"
            QEMU_CMD+=" -object tdx-guest,sept-ve-disable,id=tdx"
            if [[ ${QUOTE_TYPE} == "tdvmcall" ]]; then
                QEMU_CMD+=",quote-generation-service=vsock:2:4050"
            fi
            if [[ ${DEBUG} == true ]]; then
                QEMU_CMD+=",debug=on"
            fi
            QEMU_CMD+=" -object memory-backend-memfd-private,id=ram1,size=${MEM}"
            ;;
        "efi")
            PARAM_MACHINE+=",kernel_irqchip=split"
            QEMU_CMD+=" -bios ${OVMF}"
            ;;
        "legacy")
            if [[ ! -f ${LEGACY_BIOS} ]]; then
                error "${LEGACY_BIOS} does not exist!"
            fi
            QEMU_CMD+=" -bios ${LEGACY_BIOS} "
            ;;
        "sgx")
            PARAM_MACHINE+=",sgx-epc.0.memdev=mem0,sgx-epc.0.node=0"
            QEMU_CMD+=" -cpu host,+sgx-provisionkey,+sgxlc,+sgx1"
            QEMU_CMD+=" -object memory-backend-epc,id=mem0,size=${SGX_EPC_SIZE},prealloc=on"
            ;;
        *)
            error "Invalid ${VM_TYPE}, must be [legacy|efi|td|sgx]"
            ;;
    esac

    QEMU_CMD+=$PARAM_CPU
    QEMU_CMD+=$PARAM_MACHINE
    QEMU_CMD+=" -device virtio-net-pci,netdev=mynet0"

    # Specify the number of CPUs
    QEMU_CMD+=" -smp ${CPUS} "

    # Customize MAC address. NOTE: it will impact TDX measurement RTMR.
    if [[ -n ${MAC_ADDR} ]]; then
        QEMU_CMD+=",mac=${MAC_ADDR}"
    fi

    # Forward SSH port to the host
    QEMU_CMD+=" -netdev user,id=mynet0,hostfwd=tcp::$FORWARD_PORT-:22 "

    # Enable vsock
    if [[ ${USE_VSOCK} == true ]]; then
        QEMU_CMD+=" -device vhost-vsock-pci,guest-cid=3 "
    fi

    case ${BOOT_TYPE} in
        "direct")
            KERNEL="${KERNEL:-${DEFAULT_KERNEL}}"
            if [[ ! -f ${KERNEL} ]]; then
                usage
                error "Kernel image file ${KERNEL} not exist. Please specify via option \"-k\""
            fi

            QEMU_CMD+=" -kernel $(readlink -f "${KERNEL}") "
            if [[ ${VM_TYPE} == "td" ]]; then
                # shellcheck disable=SC2089
                QEMU_CMD+=" -append \"${KERNEL_CMD_TD}\" "
            else
                # shellcheck disable=SC2089
                QEMU_CMD+=" -append \"${KERNEL_CMD_NON_TD}\" "
            fi
            ;;
        "grub")
            if [[ ${USE_SERIAL_CONSOLE} == false ]]; then
                warn "Using HVC console for grub, could not accept key input in grub menu"
            fi
            ;;
        *)
            echo "Invalid ${BOOT_TYPE}, must be [direct|grub]"
            exit 1
            ;;
    esac

    echo "========================================="
    echo "Guest Image       : ${GUEST_IMG}"
    echo "Kernel binary     : ${KERNEL}"
    echo "OVMF              : ${OVMF}"
    echo "VM Type           : ${VM_TYPE}"
    echo "CPUS              : ${CPUS}"
    echo "Boot type         : ${BOOT_TYPE}"
    echo "Monitor port      : ${MONITOR_PORT}"
    echo "Enable vsock      : ${USE_VSOCK}"
    echo "Enable debug      : ${DEBUG}"
    if [[ -n ${MAC_ADDR} ]]; then
        echo "MAC Address       : ${MAC_ADDR}"
    fi
    if [[ ${USE_SERIAL_CONSOLE} == true ]]; then
        QEMU_CMD+=" ${SERIAL_CONSOLE} "
        echo "Console           : Serial"
    else
        QEMU_CMD+=" ${HVC_CONSOLE} "
        echo "Console           : HVC"
    fi
    if [[ -n ${QUOTE_TYPE} ]]; then
        echo "Quote type        : ${QUOTE_TYPE}"
    fi
    echo "========================================="
}

launch_vm() {
    # remap CTRL-C to CTRL ]
    echo "Remapping CTRL-C to CTRL-]"
    stty intr ^]
    echo "Launch VM:"
    # shellcheck disable=SC2086,SC2090
    echo ${QEMU_CMD}
    # shellcheck disable=SC2086
    eval ${QEMU_CMD}
    # restore CTRL-C mapping
    stty intr ^c
}

process_args "$@"
launch_vm
Tan-YiFan commented 9 months ago

I do not have access to TDX-enabled servers, so my analysis is completely based on related source code.

Which stage did you reach after executing this boot script? Did you get any boot log from the guest kernel? Does dmesg on the host machine provide any useful message?

The message is provided by qemu code. Hacking into this function (add printfs) might help. In the message, 65535 might refer to no limitation: code1 code2

iihihiuh commented 9 months ago

It seems like without proper DMA mappings. Drivers cannot communicate with GPUs associated with TDX enclaves.

Tan-YiFan commented 9 months ago

Could you produce some error log (such as dmesg of guest or host) if it existed?

iihihiuh commented 9 months ago

Everytime I launch the TDX enclave, my host will produce such messages:

[65617.933896] vfio-pci 0000:b0:00.0: Enabling HDA controller
[65618.309380] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x19@0x100
[65618.309464] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x24@0x140
[65618.309476] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x25@0x14c
[65618.309488] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x26@0x158
[65618.309500] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x2a@0x188
[65618.309607] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x27@0x200
[65618.309829] vfio-pci 0000:b0:00.0: vfio_ecap_init: hiding ecap 0x2e@0x2c8

Then, the guest VM will produce such messages. There are some messages about DMA, and the last several messages is produced when I am trying to install the GPU driver. I suspect due to the DMA mapping errors, my driver installation is not sucesful.

[    0.000000] tdx: Guest detected
[    0.000000] TDX: Enabled TDX guest device filter
[    0.000000] Linux version 6.2.0-mvp10v1+8-generic (yongqin@lambda6-scip) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #mvp10v1+tdx SMP PREEMPT_DYNAMIC Tue Feb 13 19:07:20 UTC 2024
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.0-mvp10v1+8-generic root=UUID=34062841-799a-47dd-a3e5-795f06780507 ro console=tty1 console=ttyS0
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Hygon HygonGenuine
[    0.000000]   Centaur CentaurHauls
[    0.000000]   zhaoxin   Shanghai  
[    0.000000] x86/split lock detection: #DB: warning on user-space bus_locks
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x20000: 'AMX Tile config'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x40000: 'AMX Tile data'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: xstate_offset[5]:  832, xstate_sizes[5]:   64
[    0.000000] x86/fpu: xstate_offset[6]:  896, xstate_sizes[6]:  512
[    0.000000] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
[    0.000000] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]:    8
[    0.000000] x86/fpu: xstate_offset[17]: 2496, xstate_sizes[17]:   64
[    0.000000] x86/fpu: xstate_offset[18]: 2560, xstate_sizes[18]: 8192
[    0.000000] x86/fpu: Enabled xstate features 0x602e7, context size is 10752 bytes, using 'compacted' format.
[    0.000000] signal: max sigframe size: 11952
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000080bfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000080c000-0x000000000080cfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000000080d000-0x000000007d205fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d206000-0x000000007d206fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007d207000-0x000000007d275fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d276000-0x000000007d385fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007d386000-0x000000007d3a6fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d3a7000-0x000000007d3a8fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007d3a9000-0x000000007d3aafff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d3ab000-0x000000007d3abfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007d3ac000-0x000000007d3b6fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d3b7000-0x000000007d3fffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007d400000-0x000000007d635fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d636000-0x000000007d645fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007d646000-0x000000007d647fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d648000-0x000000007d64ffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007d650000-0x000000007d658fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007d659000-0x000000007d65bfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007d65c000-0x000000007d6a9fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007d6aa000-0x000000007e7c0fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e7c1000-0x000000007e818fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e819000-0x000000007e820fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007e821000-0x000000007e824fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007e825000-0x000000007ff7bfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007ff7c000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000107fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: ACPI=0x7e820000 ACPI 2.0=0x7e820014 SMBIOS=0x7e7f6000 MEMATTR=0x7d228298 MOKvar=0x7d206000 
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown
[    0.000000] Hypervisor detected: KVM
[    0.000000] tsc: Detected 1000.000 MHz processor
[    0.000011] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000014] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000028] last_pfn = 0x1080000 max_arch_pfn = 0x10000000000
[    0.000030] MTRRs disabled (not available)
[    0.000031] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
[    0.000044] x2apic: enabled by BIOS, switching to x2apic ops
[    0.000047] last_pfn = 0x7ff7c max_arch_pfn = 0x10000000000
[    0.020095] last_pfn = 0x1080000 max_arch_pfn = 0x10000000000
[    0.020099] software IO TLB: SWIOTLB bounce buffer size adjusted to 1024MB
[    0.020114] Using GB pages for direct mapping
[    0.028748] Secure boot disabled
[    0.028749] RAMDISK: [mem 0x74954000-0x76babfff]
[    0.028753] ACPI: Early table checksum verification disabled
[    0.028757] ACPI: RSDP 0x000000007E820014 000024 (v02 BOCHS )
[    0.028763] ACPI: XSDT 0x000000007E81F0E8 00004C (v01 BOCHS  BXPC     00000001      01000013)
[    0.028772] ACPI: FACP 0x000000007E81A000 0000F4 (v03 BOCHS  BXPC     00000001 BXPC 00000001)
[    0.028781] ACPI: DSDT 0x000000007E81B000 002563 (v01 BOCHS  BXPC     00000001 BXPC 00000001)
[    0.028786] ACPI: FACS 0x000000007E823000 000040
[    0.028792] ACPI: CCEL 0x000000007E81E000 000038 (v01 INTEL  EDK2     00000002      01000013)
[    0.028797] ACPI: Ignoring installation of MCFG
[    0.028801] ACPI: Ignoring installation of WAET
[    0.028806] ACPI: APIC 0x000000007D3AB000 00016E (v01 BOCHS  BXPC     00000001 BXPC 00000001)
[    0.028810] ACPI: Reserving FACP table memory at [mem 0x7e81a000-0x7e81a0f3]
[    0.028812] ACPI: Reserving DSDT table memory at [mem 0x7e81b000-0x7e81d562]
[    0.028812] ACPI: Reserving FACS table memory at [mem 0x7e823000-0x7e82303f]
[    0.028813] ACPI: Reserving CCEL table memory at [mem 0x7e81e000-0x7e81e037]
[    0.028847] Setting APIC routing to cluster x2apic.
[    0.029002] No NUMA configuration found
[    0.029003] Faking a node at [mem 0x0000000000000000-0x000000107fffffff]
[    0.029013] NODE_DATA(0) allocated [mem 0x107ffd5000-0x107fffffff]
[    3.905171] Zone ranges:
[    3.905173]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    3.905175]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    3.905176]   Normal   [mem 0x0000000100000000-0x000000107fffffff]
[    3.905177]   Device   empty
[    3.905178] Movable zone start for each node
[    3.905180] Early memory node ranges
[    3.905181]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    3.905182]   node   0: [mem 0x0000000000100000-0x000000000080bfff]
[    3.905182]   node   0: [mem 0x000000000080d000-0x000000007d205fff]
[    3.905183]   node   0: [mem 0x000000007d207000-0x000000007d275fff]
[    3.905184]   node   0: [mem 0x000000007d386000-0x000000007d3a6fff]
[    3.905185]   node   0: [mem 0x000000007d3a9000-0x000000007d3aafff]
[    3.905185]   node   0: [mem 0x000000007d3ac000-0x000000007d3b6fff]
[    3.905186]   node   0: [mem 0x000000007d400000-0x000000007d635fff]
[    3.905186]   node   0: [mem 0x000000007d646000-0x000000007d647fff]
[    3.905187]   node   0: [mem 0x000000007d659000-0x000000007d65bfff]
[    3.905187]   node   0: [mem 0x000000007d6aa000-0x000000007e7c0fff]
[    3.905188]   node   0: [mem 0x000000007e825000-0x000000007ff7bfff]
[    3.905188]   node   0: [mem 0x0000000100000000-0x000000107fffffff]
[    3.905194] Initmem setup node 0 [mem 0x0000000000001000-0x000000107fffffff]
[    3.905200] On node 0, zone DMA: 1 pages in unavailable ranges
[    3.905217] On node 0, zone DMA: 96 pages in unavailable ranges
[    3.905235] On node 0, zone DMA: 1 pages in unavailable ranges
[    3.909247] On node 0, zone DMA32: 1 pages in unavailable ranges
[    3.909252] On node 0, zone DMA32: 272 pages in unavailable ranges
[    3.909252] On node 0, zone DMA32: 2 pages in unavailable ranges
[    3.909253] On node 0, zone DMA32: 1 pages in unavailable ranges
[    3.909259] On node 0, zone DMA32: 73 pages in unavailable ranges
[    3.909259] On node 0, zone DMA32: 16 pages in unavailable ranges
[    3.909260] On node 0, zone DMA32: 17 pages in unavailable ranges
[    3.909299] On node 0, zone DMA32: 78 pages in unavailable ranges
[    3.909350] On node 0, zone DMA32: 100 pages in unavailable ranges
[    4.040335] On node 0, zone Normal: 132 pages in unavailable ranges
[    4.040342] ACPI: PM-Timer IO Port: 0x608
[    4.040372] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    4.040590] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[    4.040593] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[    4.040595] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 high edge)
[    4.040595] ACPI: INT_SRC_OVR (bus 0 bus_irq 2 global_irq 2 high edge)
[    4.040596] ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 3 high edge)
[    4.040597] ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 4 high edge)
[    4.040598] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high edge)
[    4.040598] ACPI: INT_SRC_OVR (bus 0 bus_irq 6 global_irq 6 high edge)
[    4.040599] ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 7 high edge)
[    4.040599] ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 high edge)
[    4.040600] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high edge)
[    4.040601] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high edge)
[    4.040601] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high edge)
[    4.040602] ACPI: INT_SRC_OVR (bus 0 bus_irq 12 global_irq 12 high edge)
[    4.040602] ACPI: INT_SRC_OVR (bus 0 bus_irq 13 global_irq 13 high edge)
[    4.040603] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
[    4.040604] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
[    4.040606] ACPI: Found unsupported MADT entry (type = 0x10)
[    4.040608] ACPI: Using ACPI (MADT) for SMP configuration information
[    4.040609] TSC deadline timer available
[    4.040618] smpboot: Allowing 16 CPUs, 0 hotplug CPUs
[    4.040681] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    4.040683] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    4.040684] PM: hibernation: Registered nosave memory: [mem 0x0080c000-0x0080cfff]
[    4.040685] PM: hibernation: Registered nosave memory: [mem 0x7d206000-0x7d206fff]
[    4.040686] PM: hibernation: Registered nosave memory: [mem 0x7d276000-0x7d385fff]
[    4.040688] PM: hibernation: Registered nosave memory: [mem 0x7d3a7000-0x7d3a8fff]
[    4.040689] PM: hibernation: Registered nosave memory: [mem 0x7d3ab000-0x7d3abfff]
[    4.040690] PM: hibernation: Registered nosave memory: [mem 0x7d3b7000-0x7d3fffff]
[    4.040691] PM: hibernation: Registered nosave memory: [mem 0x7d636000-0x7d645fff]
[    4.040692] PM: hibernation: Registered nosave memory: [mem 0x7d648000-0x7d64ffff]
[    4.040692] PM: hibernation: Registered nosave memory: [mem 0x7d650000-0x7d658fff]
[    4.040693] PM: hibernation: Registered nosave memory: [mem 0x7d65c000-0x7d6a9fff]
[    4.040694] PM: hibernation: Registered nosave memory: [mem 0x7e7c1000-0x7e818fff]
[    4.040694] PM: hibernation: Registered nosave memory: [mem 0x7e819000-0x7e820fff]
[    4.040695] PM: hibernation: Registered nosave memory: [mem 0x7e821000-0x7e824fff]
[    4.040696] PM: hibernation: Registered nosave memory: [mem 0x7ff7c000-0x7fffffff]
[    4.040696] PM: hibernation: Registered nosave memory: [mem 0x80000000-0xafffffff]
[    4.040696] PM: hibernation: Registered nosave memory: [mem 0xb0000000-0xbfffffff]
[    4.040697] PM: hibernation: Registered nosave memory: [mem 0xc0000000-0xffffffff]
[    4.040698] [mem 0xc0000000-0xffffffff] available for PCI devices
[    4.040699] Booting paravirtualized kernel on KVM
[    4.040701] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    4.040706] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:16 nr_cpu_ids:16 nr_node_ids:1
[    4.051253] percpu: Embedded 61 pages/cpu s212992 r8192 d28672 u262144
[    4.051259] pcpu-alloc: s212992 r8192 d28672 u262144 alloc=1*2097152
[    4.051261] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 
[    4.051421] kvm-guest: PV spinlocks disabled, no host support
[    4.051424] Fallback order for Node 0: 0 
[    4.051426] Built 1 zonelists, mobility grouping on.  Total pages: 16514132
[    4.051427] Policy zone: Normal
[    4.051428] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.2.0-mvp10v1+8-generic root=UUID=34062841-799a-47dd-a3e5-795f06780507 ro console=tty1 console=ttyS0
[    4.051467] Unknown kernel command line parameters "BOOT_IMAGE=/boot/vmlinuz-6.2.0-mvp10v1+8-generic", will be passed to user space.
[    4.051484] random: crng init done
[    4.216664] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes, linear)
[    4.299219] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes, linear)
[    4.299478] mem auto-init: stack:off, heap alloc:on, heap free:off
[    4.299482] software IO TLB: area num 16.
[    4.299486] IO TLB: ff9a00000-1039a00000 accepted 0
[    4.424552] Memory: 64745956K/67105704K available (18432K kernel code, 4111K rwdata, 12808K rodata, 4444K init, 5616K bss, 2359544K reserved, 0K cma-reserved)
[    4.424638] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[    4.424674] ftrace: allocating 53185 entries in 208 pages
[    4.433903] ftrace: allocated 208 pages with 3 groups
[    4.434610] Dynamic Preempt: voluntary
[    4.434647] rcu: Preemptible hierarchical RCU implementation.
[    4.434648] rcu:     RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=16.
[    4.434650]  Trampoline variant of Tasks RCU enabled.
[    4.434650]  Rude variant of Tasks RCU enabled.
[    4.434651]  Tracing variant of Tasks RCU enabled.
[    4.434651] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    4.434652] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
[    4.437434] NR_IRQS: 524544, nr_irqs: 536, preallocated irqs: 0
[    4.437470] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    4.437710] Console: colour dummy device 80x25
[    4.437712] printk: console [tty1] enabled
[    4.437877] printk: console [ttyS0] enabled
[    4.597716] Memory Encryption Features active: Intel TDX
[    4.597944] ACPI: Core revision 20221020
[    4.598158] Failed to register legacy timer interrupt
[    4.598345] APIC: Switch to symmetric I/O mode setup
[    4.602636] Switched APIC routing to physical x2apic.
[    4.602830] kvm-guest: setup PV IPIs
[    4.603122] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    4.603508] Calibrating delay loop (skipped), value calculated using timer frequency.. 2000.00 BogoMIPS (lpj=4000000)
[    4.603893] pid_max: default: 32768 minimum: 301
[    4.605537] LSM: initializing lsm=lockdown,capability,landlock,yama,integrity,apparmor
[    4.605837] landlock: Up and running.
[    4.605975] Yama: becoming mindful.
[    4.606129] AppArmor: AppArmor initialized
[    4.606421] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    4.607505] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    4.607505] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    4.607505] process: using TDX aware idle routine
[    4.607505] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    4.607505] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    4.607505] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    4.607505] Spectre V2 : Mitigation: Enhanced IBRS
[    4.607505] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    4.607505] Spectre V2 : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT
[    4.607505] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    4.607505] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[    4.607505] Freeing SMP alternatives memory: 44K
[    4.607505] smpboot: CPU0: Intel 06/cf (family: 0x6, model: 0xcf, stepping: 0x2)
[    4.607505] Performance Events: unsupported p6 CPU model 207 no PMU driver, software events only.
[    4.607505] rcu: Hierarchical SRCU implementation.
[    4.607505] rcu:     Max phase no-delay instances is 1000.
[    4.607505] NMI watchdog: Perf NMI watchdog permanently disabled
[    4.607505] smp: Bringing up secondary CPUs ...
[    4.607505] x86: Booting SMP configuration:
[    4.607505] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15
[    4.611947] smp: Brought up 1 node, 16 CPUs
[    4.612052] smpboot: Max logical packages: 1
[    4.612211] smpboot: Total of 16 processors activated (32000.00 BogoMIPS)
[    4.616724] devtmpfs: initialized
[    4.616724] x86/mm: Memory block size: 2048MB
[    4.616724] KVM-debug: PASS: single step #VE emulated instructions
[    4.616724] KVM-debug: PASS: single step TDX module emulated CPUID 0
[    4.616724] KVM-debug: PASS: single step TDX module emulated RDMSR 0x1a0
[    4.616724] ACPI: PM: Registering ACPI NVS region [mem 0x0080c000-0x0080cfff] (4096 bytes)
[    4.616724] ACPI: PM: Registering ACPI NVS region [mem 0x7d636000-0x7d645fff] (65536 bytes)
[    4.616852] ACPI: PM: Registering ACPI NVS region [mem 0x7d648000-0x7d64ffff] (32768 bytes)
[    4.617154] ACPI: PM: Registering ACPI NVS region [mem 0x7e821000-0x7e824fff] (16384 bytes)
[    4.617489] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    4.619546] futex hash table entries: 4096 (order: 6, 262144 bytes, linear)
[    4.619857] pinctrl core: initialized pinctrl subsystem
[    4.620297] PM: RTC time: 19:41:06, date: 2024-02-27
[    4.621033] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    4.627529] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
[    4.631520] DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    4.635518] DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    4.635822] audit: initializing netlink subsys (disabled)
[    4.647546] audit: type=2000 audit(1709062866.044:1): state=initialized audit_enabled=0 res=1
[    4.647614] thermal_sys: Registered thermal governor 'fair_share'
[    4.647836] thermal_sys: Registered thermal governor 'bang_bang'
[    4.648060] thermal_sys: Registered thermal governor 'step_wise'
[    4.648279] thermal_sys: Registered thermal governor 'user_space'
[    4.648498] thermal_sys: Registered thermal governor 'power_allocator'
[    4.648724] EISA bus registered
[    4.649087] cpuidle: using governor ladder
[    4.649241] cpuidle: using governor menu
[    4.649525] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    4.649881] PCI: Using configuration type 1 for base access
[    4.650351] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[    4.651589] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    4.651772] HugeTLB: 16380 KiB vmemmap can be freed for a 1.00 GiB page
[    4.652013] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    4.652260] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[    4.652518] fbcon: Taking over console
[    4.652518] ACPI: Added _OSI(Module Device)
[    4.652518] ACPI: Added _OSI(Processor Device)
[    4.652518] ACPI: Added _OSI(3.0 _SCP Extensions)
[    4.652518] ACPI: Added _OSI(Processor Aggregator Device)
[    4.653338] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    4.656363] ACPI: Interpreter enabled
[    4.656511] ACPI: PM: (supports S0 S3 S4 S5)
[    4.656670] ACPI: Using IOAPIC for interrupt routing
[    4.656865] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    4.657196] PCI: Using E820 reservations for host bridge windows
[    4.657504] ACPI: Enabled 2 GPEs in block 00 to 3F
[    4.659611] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    4.659841] acpi PNP0A08:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[    4.660123] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
[    4.660504] acpi PNP0A08:00: MMCONFIG is disabled, can't access extended configuration space under this bridge
[    4.661042] PCI host bridge to bus 0000:00
[    4.661194] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    4.661442] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    4.661689] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    4.661961] pci_bus 0000:00: root bus resource [mem 0x80000000-0xafffffff window]
[    4.662233] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
[    4.662505] pci_bus 0000:00: root bus resource [mem 0x700000000000-0x70200300bfff window]
[    4.662802] pci_bus 0000:00: root bus resource [bus 00-ff]
[    4.663086] pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000
[    4.664691] pci 0000:00:01.0: [1af4:1041] type 00 class 0x020000
[    4.665970] pci 0000:00:01.0: reg 0x14: [mem 0xc0204000-0xc0204fff]
[    4.667514] pci 0000:00:01.0: reg 0x20: [mem 0x702003000000-0x702003003fff 64bit pref]
[    4.668221] pci 0000:00:01.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    4.669096] pci 0000:00:02.0: [1af4:1043] type 00 class 0x078000
[    4.670256] pci 0000:00:02.0: reg 0x14: [mem 0xc0203000-0xc0203fff]
[    4.671801] pci 0000:00:02.0: reg 0x20: [mem 0x702003004000-0x702003007fff 64bit pref]
[    4.672924] pci 0000:00:03.0: [1b36:000c] type 01 class 0x060400
[    4.679516] pci 0000:00:03.0: reg 0x10: [mem 0xc0202000-0xc0202fff]
[    4.704987] pci 0000:00:04.0: [1af4:1042] type 00 class 0x010000
[    4.705999] pci 0000:00:04.0: reg 0x14: [mem 0xc0201000-0xc0201fff]
[    4.707359] pci 0000:00:04.0: reg 0x20: [mem 0x702003008000-0x70200300bfff 64bit pref]
[    4.759482] pci 0000:00:1f.0: [8086:2918] type 00 class 0x060100
[    4.760773] pci 0000:00:1f.2: [8086:2922] type 00 class 0x010601
[    4.763337] pci 0000:00:1f.2: reg 0x20: [io  0x6240-0x625f]
[    4.763945] pci 0000:00:1f.2: reg 0x24: [mem 0xc0200000-0xc0200fff]
[    4.764896] pci 0000:00:1f.3: [8086:2930] type 00 class 0x0c0500
[    4.766596] pci 0000:00:1f.3: reg 0x20: [io  0x6200-0x623f]
[    4.768069] acpiphp: Slot [0] registered
[    4.768487] pci 0000:01:00.0: [10de:2331] type 00 class 0x030200
[    4.775520] pci 0000:01:00.0: reg 0x10: [mem 0x702002000000-0x702002ffffff 64bit pref]
[    4.783519] pci 0000:01:00.0: reg 0x18: [mem 0x700000000000-0x701fffffffff 64bit pref]
[    4.791517] pci 0000:01:00.0: reg 0x20: [mem 0x702000000000-0x702001ffffff 64bit pref]
[    4.800280] pci 0000:01:00.0: Max Payload Size set to 128 (was 256, max 256)
[    4.801308] pci 0000:00:03.0: PCI bridge to [bus 01]
[    4.801531] pci 0000:00:03.0:   bridge window [io  0x6000-0x6fff]
[    4.801788] pci 0000:00:03.0:   bridge window [mem 0xc0000000-0xc01fffff]
[    4.802105] pci 0000:00:03.0:   bridge window [mem 0x700000000000-0x702002ffffff 64bit pref]
[    4.803015] ACPI: PCI: Interrupt link LNKA configured for IRQ 10
[    4.803348] ACPI: PCI: Interrupt link LNKB configured for IRQ 10
[    4.803613] ACPI: PCI: Interrupt link LNKC configured for IRQ 11
[    4.803929] ACPI: PCI: Interrupt link LNKD configured for IRQ 11
[    4.804244] ACPI: PCI: Interrupt link LNKE configured for IRQ 10
[    4.804560] ACPI: PCI: Interrupt link LNKF configured for IRQ 10
[    4.804878] ACPI: PCI: Interrupt link LNKG configured for IRQ 11
[    4.805194] ACPI: PCI: Interrupt link LNKH configured for IRQ 11
[    4.806296] ACPI: PCI: Interrupt link GSIA configured for IRQ 16
[    4.806676] ACPI: PCI: Interrupt link GSIB configured for IRQ 17
[    4.807045] ACPI: PCI: Interrupt link GSIC configured for IRQ 18
[    4.807436] ACPI: PCI: Interrupt link GSID configured for IRQ 19
[    4.807528] ACPI: PCI: Interrupt link GSIE configured for IRQ 20
[    4.807898] ACPI: PCI: Interrupt link GSIF configured for IRQ 21
[    4.808182] ACPI: PCI: Interrupt link GSIG configured for IRQ 22
[    4.808408] ACPI: PCI: Interrupt link GSIH configured for IRQ 23
[    4.809361] iommu: Default domain type: Translated 
[    4.809361] iommu: DMA domain TLB invalidation policy: lazy mode 
[    4.809361] SCSI subsystem initialized
[    4.809361] libata version 3.00 loaded.
[    4.809361] ACPI: bus type USB registered
[    4.809361] usbcore: registered new interface driver usbfs
[    4.809361] usbcore: registered new interface driver hub
[    4.809361] usbcore: registered new device driver usb
[    4.809361] pps_core: LinuxPPS API ver. 1 registered
[    4.809361] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    4.809361] PTP clock support registered
[    4.811533] EDAC MC: Ver: 3.0.0
[    4.812036] Registered efivars operations
[    4.812036] NetLabel: Initializing
[    4.812036] NetLabel:  domain hash size = 128
[    4.812036] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    4.812196] NetLabel:  unlabeled traffic allowed by default
[    4.812414] PCI: Using ACPI for IRQ routing
[    4.812414] PCI: pci_cache_line_size set to 64 bytes
[    4.812414] pci 0000:00:1f.2: can't claim BAR 4 [io  0x6240-0x625f]: address conflict with PCI Bus 0000:01 [io  0x6000-0x6fff]
[    4.812414] pci 0000:00:1f.3: can't claim BAR 4 [io  0x6200-0x623f]: address conflict with PCI Bus 0000:01 [io  0x6000-0x6fff]
[    4.812834] e820: reserve RAM buffer [mem 0x0080c000-0x008fffff]
[    4.812836] e820: reserve RAM buffer [mem 0x7d206000-0x7fffffff]
[    4.812837] e820: reserve RAM buffer [mem 0x7d276000-0x7fffffff]
[    4.812839] e820: reserve RAM buffer [mem 0x7d3a7000-0x7fffffff]
[    4.812840] e820: reserve RAM buffer [mem 0x7d3ab000-0x7fffffff]
[    4.812841] e820: reserve RAM buffer [mem 0x7d3b7000-0x7fffffff]
[    4.812842] e820: reserve RAM buffer [mem 0x7d636000-0x7fffffff]
[    4.812843] e820: reserve RAM buffer [mem 0x7d648000-0x7fffffff]
[    4.812845] e820: reserve RAM buffer [mem 0x7d65c000-0x7fffffff]
[    4.812846] e820: reserve RAM buffer [mem 0x7e7c1000-0x7fffffff]
[    4.812847] e820: reserve RAM buffer [mem 0x7ff7c000-0x7fffffff]
[    4.812883] vgaarb: loaded
[    4.812883] clocksource: Switched to clocksource tsc-early
[    4.812883] VFS: Disk quotas dquot_6.6.0
[    4.812883] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    4.812883] AppArmor: AppArmor Filesystem Enabled
[    4.837847] pnp: PnP ACPI init
[    4.838278] pnp: PnP ACPI: found 5 devices
[    4.841474] NET: Registered PF_INET protocol family
[    4.841934] IP idents hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    4.844152] tcp_listen_portaddr_hash hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    4.844521] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    4.863236] TCP established hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    4.887740] TCP bind hash table entries: 65536 (order: 9, 2097152 bytes, linear)
[    4.888238] TCP: Hash tables configured (established 524288 bind 65536)
[    4.901147] MPTCP token hash table entries: 65536 (order: 8, 1572864 bytes, linear)
[    4.901780] UDP hash table entries: 32768 (order: 8, 1048576 bytes, linear)
[    4.902206] UDP-Lite hash table entries: 32768 (order: 8, 1048576 bytes, linear)
[    4.902586] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    4.902802] NET: Registered PF_XDP protocol family
[    4.902987] pci 0000:00:01.0: BAR 6: assigned [mem 0x80000000-0x8007ffff pref]
[    4.903266] pci 0000:00:1f.3: BAR 4: assigned [io  0x1000-0x103f]
[    4.903770] pci 0000:00:1f.2: BAR 4: assigned [io  0x1040-0x105f]
[    4.904195] pci 0000:00:03.0: PCI bridge to [bus 01]
[    4.904396] pci 0000:00:03.0:   bridge window [io  0x6000-0x6fff]
[    4.904984] pci 0000:00:03.0:   bridge window [mem 0xc0000000-0xc01fffff]
[    5.213073] pci 0000:00:03.0:   bridge window [mem 0x700000000000-0x702002ffffff 64bit pref]
[    7.175461] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    7.175709] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    7.175935] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    7.176186] pci_bus 0000:00: resource 7 [mem 0x80000000-0xafffffff window]
[    7.176436] pci_bus 0000:00: resource 8 [mem 0xc0000000-0xfebfffff window]
[    7.176686] pci_bus 0000:00: resource 9 [mem 0x700000000000-0x70200300bfff window]
[    7.176962] pci_bus 0000:01: resource 0 [io  0x6000-0x6fff]
[    7.177166] pci_bus 0000:01: resource 1 [mem 0xc0000000-0xc01fffff]
[    7.177395] pci_bus 0000:01: resource 2 [mem 0x700000000000-0x702002ffffff 64bit pref]
[    7.177732] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    7.177796] Trying to unpack rootfs image as initramfs...
[    7.177973] software IO TLB: mapped [mem 0x0000000ff9a00000-0x0000001039a00000] (1024MB)
[    7.178523] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    7.179014] clocksource: Switched to clocksource tsc
[    7.194859] Initialise system trusted keyrings
[    7.195032] Key type blacklist registered
[    7.195253] workingset: timestamp_bits=36 max_order=24 bucket_order=0
[    7.195497] zbud: loaded
[    7.195737] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    7.196064] fuse: init (API version 7.38)
[    7.196394] integrity: Platform Keyring initialized
[    7.202083] Key type asymmetric registered
[    7.202238] Asymmetric key parser 'x509' registered
[    7.202430] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
[    7.202766] io scheduler mq-deadline registered
[    7.209550] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    7.209858] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    7.210179] ACPI: button: Power Button [PWRF]
[    7.210698] ACPI: \_SB_.GSIF: Enabled at IRQ 21
[    7.212787] ACPI: \_SB_.GSIG: Enabled at IRQ 22
[    7.215018] ACPI: \_SB_.GSIE: Enabled at IRQ 20
[    7.217272] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    7.298543] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[    7.321543] Linux agpgart interface v0.103
[    7.371868] loop: module loaded
[    7.372156] virtio_blk virtio2: 16/0/0 default/read/poll queues
[    7.403361] virtio_blk virtio2: [vda] 424042496 512-byte logical blocks (217 GB/202 GiB)
[    7.410417]  vda: vda1 vda14 vda15
[    7.430697] tun: Universal TUN/TAP device driver, 1.6
[    7.444664] PPP generic driver version 2.4.2
[    7.445001] VFIO - User Level meta-driver version: 0.3
[    7.445250] i8042: PNP: No PS/2 controller found.
[    7.445425] i8042: Probing ports directly.
[    7.446424] i8042: No controller found
[    7.446623] mousedev: PS/2 mouse device common for all mice
[    7.446912] i2c_dev: i2c /dev entries driver
[    7.447079] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    7.447557] device-mapper: uevent: version 1.0.3
[    7.447771] device-mapper: ioctl: 4.47.0-ioctl (2022-07-28) initialised: dm-devel@redhat.com
[    7.448089] platform eisa.0: Probing EISA bus 0
[    7.448258] platform eisa.0: EISA: Cannot allocate resource for mainboard
[    7.448506] platform eisa.0: Cannot allocate resource for EISA slot 1
[    7.448742] platform eisa.0: Cannot allocate resource for EISA slot 2
[    7.448977] platform eisa.0: Cannot allocate resource for EISA slot 3
[    7.449212] platform eisa.0: Cannot allocate resource for EISA slot 4
[    7.449448] platform eisa.0: Cannot allocate resource for EISA slot 5
[    7.449683] platform eisa.0: Cannot allocate resource for EISA slot 6
[    7.449918] platform eisa.0: Cannot allocate resource for EISA slot 7
[    7.450153] platform eisa.0: Cannot allocate resource for EISA slot 8
[    7.450389] platform eisa.0: EISA: Detected 0 cards
[    7.450569] intel_pstate: CPU model not supported
[    7.451260] ledtrig-cpu: registered to indicate activity on CPUs
[    7.451648] drop_monitor: Initializing network drop monitor service
[    7.485596] NET: Registered PF_INET6 protocol family
[    7.714268] Freeing initrd memory: 35168K
[    7.731396] Segment Routing with IPv6
[    7.731558] In-situ OAM (IOAM) with IPv6
[    7.731718] NET: Registered PF_PACKET protocol family
[    7.731989] Key type dns_resolver registered
[    7.732158] mce: Unable to init MCE device (rc: -5)
[    7.732419] IPI shorthand broadcast: enabled
[    7.733273] sched_clock: Marking stable (7566533755, 165835904)->(28502522338, -20770152679)
[    7.733791] registered taskstats version 1
[    7.734450] Loading compiled-in X.509 certificates
[    7.735000] Loaded X.509 cert 'Build time autogenerated kernel key: 4a1cb0c5f6f05ea8fa61c4afba62b6bd0569d32d'
[    7.735674] Loaded X.509 cert 'Canonical Ltd. Live Patch Signing: 14df34d1a87cf37625abec039ef2bf521249b969'
[    7.736296] Loaded X.509 cert 'Canonical Ltd. Kernel Module Signing: 88f752e560a1e0737e31163a466ad7b70a850c19'
[    7.736657] blacklist: Loading compiled-in revocation X.509 certificates
[    7.736912] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing: 61482aa2830d0ab2ad5af10b7250da9033ddcef0'
[    7.738137] zswap: loaded using pool lzo/zbud
[    7.751765] Key type .fscrypt registered
[    7.751913] Key type fscrypt-provisioning registered
[    7.766768] Key type encrypted registered
[    7.766919] AppArmor: AppArmor sha1 policy hashing enabled
[    7.767180] ima: No TPM chip found, activating TPM-bypass!
[    7.767392] Loading compiled-in module X.509 certificates
[    7.767911] Loaded X.509 cert 'Build time autogenerated kernel key: 4a1cb0c5f6f05ea8fa61c4afba62b6bd0569d32d'
[    7.768269] ima: Allocated hash algorithm: sha1
[    7.768441] ima: No architecture policies found
[    7.768618] evm: Initialising EVM extended attributes:
[    7.768806] evm: security.selinux
[    7.768930] evm: security.SMACK64
[    7.769054] evm: security.SMACK64EXEC
[    7.769190] evm: security.SMACK64TRANSMUTE
[    7.769342] evm: security.SMACK64MMAP
[    7.769477] evm: security.apparmor
[    7.769604] evm: security.ima
[    7.769715] evm: security.capability
[    7.769848] evm: HMAC attrs: 0x1
[    7.770177] PM:   Magic number: 12:154:699
[    7.770358] acpi LNXCPU:0d: hash matches
[    7.770746] RAS: Correctable Errors collector initialized.
[    7.783867] failed to free unused decrypted pages
[    7.784738] Freeing unused kernel image (initmem) memory: 4444K
[    7.819553] Write protecting the kernel read-only data: 32768k
[    7.820068] Freeing unused kernel image (rodata/data gap) memory: 1528K
[    7.826104] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    7.826342] Run /init as init process
[    7.826479]   with arguments:
[    7.826480]     /init
[    7.826480]   with environment:
[    7.826481]     HOME=/
[    7.826481]     TERM=linux
[    7.826482]     BOOT_IMAGE=/boot/vmlinuz-6.2.0-mvp10v1+8-generic
[    7.928400] libahci: module verification failed: signature and/or required key missing - tainting kernel
[    7.928901] virtio_net virtio0 enp0s1: renamed from eth0
[    7.930934] cryptd: max_cpu_qlen set to 1000
[    8.105924] AVX2 version of gcm_enc/dec engaged.
[    8.106288] AES CTR mode by8 optimization enabled
[    9.279530] raid6: avx512x4 gen() 40190 MB/s
[    9.347517] raid6: avx512x2 gen() 39529 MB/s
[    9.415535] raid6: avx512x1 gen() 35594 MB/s
[    9.483516] raid6: avx2x4   gen() 36871 MB/s
[    9.551523] raid6: avx2x2   gen() 37455 MB/s
[    9.619533] raid6: avx2x1   gen() 23854 MB/s
[    9.619691] raid6: using algorithm avx512x4 gen() 40190 MB/s
[    9.687524] raid6: .... xor() 6143 MB/s, rmw enabled
[    9.687707] raid6: using avx512x2 recovery algorithm
[    9.688236] xor: automatically using best checksumming function   avx       
[    9.688703] async_tx: api initialized (async)
[    9.766225] Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yes
[    9.828472] EXT4-fs (vda1): mounted filesystem 34062841-799a-47dd-a3e5-795f06780507 with ordered data mode. Quota mode: none.
[   10.137397] systemd[1]: Inserted module 'autofs4'
[   10.165172] systemd[1]: systemd 249.11-0ubuntu3.12 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[   10.166330] systemd[1]: Detected virtualization kvm.
[   10.166516] systemd[1]: Detected architecture x86-64.
[   10.170129] systemd[1]: Hostname set to <ubuntu>.
[   10.609085] systemd[1]: Queued start job for default target Graphical Interface.
[   10.680102] systemd[1]: Created slice Slice /system/modprobe.
[   10.680568] systemd[1]: Created slice Slice /system/serial-getty.
[   10.680963] systemd[1]: Created slice Slice /system/systemd-fsck.
[   10.681311] systemd[1]: Created slice User and Session Slice.
[   10.681566] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[   10.681943] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[   10.682346] systemd[1]: Reached target Slice Units.
[   10.682540] systemd[1]: Reached target Mounting snaps.
[   10.682745] systemd[1]: Reached target Swaps.
[   10.682923] systemd[1]: Reached target Local Verity Protected Volumes.
[   10.683203] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[   10.683563] systemd[1]: Listening on LVM2 poll daemon socket.
[   10.683821] systemd[1]: Listening on multipathd control socket.
[   10.684082] systemd[1]: Listening on Syslog Socket.
[   10.684300] systemd[1]: Listening on fsck to fsckd communication Socket.
[   10.684571] systemd[1]: Listening on initctl Compatibility Named Pipe.
[   10.684892] systemd[1]: Listening on Journal Audit Socket.
[   10.685134] systemd[1]: Listening on Journal Socket (/dev/log).
[   10.685397] systemd[1]: Listening on Journal Socket.
[   10.685634] systemd[1]: Listening on Network Service Netlink Socket.
[   10.685917] systemd[1]: Listening on udev Control Socket.
[   10.686150] systemd[1]: Listening on udev Kernel Socket.
[   10.686849] systemd[1]: Mounting Huge Pages File System...
[   10.687625] systemd[1]: Mounting POSIX Message Queue File System...
[   10.688486] systemd[1]: Mounting Kernel Debug File System...
[   10.689248] systemd[1]: Mounting Kernel Trace File System...
[   10.691066] systemd[1]: Starting Journal Service...
[   10.692057] systemd[1]: Starting Set the console keyboard layout...
[   10.692919] systemd[1]: Starting Create List of Static Device Nodes...
[   10.693680] systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
[   10.694112] systemd[1]: Condition check resulted in LXD - agent being skipped.
[   10.694903] systemd[1]: Starting Load Kernel Module configfs...
[   10.695666] systemd[1]: Starting Load Kernel Module drm...
[   10.696408] systemd[1]: Starting Load Kernel Module efi_pstore...
[   10.697105] systemd[1]: Starting Load Kernel Module fuse...
[   10.697373] systemd[1]: Condition check resulted in OpenVSwitch configuration for cleanup being skipped.
[   10.697779] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
[   10.699936] pstore: Using crash dump compression: deflate
[   10.700008] systemd[1]: Starting Load Kernel Modules...
[   10.700802] pstore: Registered efi_pstore as persistent store backend
[   10.700939] systemd[1]: Starting Remount Root and Kernel File Systems...
[   10.701854] systemd[1]: Starting Coldplug All udev Devices...
[   10.702929] systemd[1]: Mounted Huge Pages File System.
[   10.703190] systemd[1]: Mounted POSIX Message Queue File System.
[   10.703471] systemd[1]: Mounted Kernel Debug File System.
[   10.703745] systemd[1]: Mounted Kernel Trace File System.
[   10.704130] systemd[1]: Finished Create List of Static Device Nodes.
[   10.704511] systemd[1]: modprobe@configfs.service: Deactivated successfully.
[   10.704894] systemd[1]: Finished Load Kernel Module configfs.
[   10.705227] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
[   10.705586] systemd[1]: Finished Load Kernel Module efi_pstore.
[   10.705899] systemd[1]: modprobe@fuse.service: Deactivated successfully.
[   10.706243] systemd[1]: Finished Load Kernel Module fuse.
[   10.706338] EXT4-fs (vda1): re-mounted 34062841-799a-47dd-a3e5-795f06780507. Quota mode: none.
[   10.707283] systemd[1]: Finished Remount Root and Kernel File Systems.
[   10.707767] systemd[1]: Finished Load Kernel Modules.
[   10.708581] systemd[1]: Mounting FUSE Control File System...
[   10.709468] systemd[1]: Mounting Kernel Configuration File System...
[   10.710301] systemd[1]: Starting Device-Mapper Multipath Device Controller...
[   10.710642] systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
[   10.711627] systemd[1]: Starting Load/Save Random Seed...
[   10.711650] ACPI: bus type drm_connector registered
[   10.712686] systemd[1]: Starting Apply Kernel Variables...
[   10.713481] systemd[1]: Starting Create System Users...
[   10.713498] alua: device handler registered
[   10.714705] emc: device handler registered
[   10.714769] systemd[1]: Started Journal Service.
[   10.715696] rdac: device handler registered
[   10.722268] systemd-journald[417]: Received client request to flush runtime journal.
[   10.814946] loop0: detected capacity change from 0 to 130888
[   10.816488] loop1: detected capacity change from 0 to 130880
[   10.817892] loop2: detected capacity change from 0 to 234312
[   10.818943] loop3: detected capacity change from 0 to 178152
[   10.820041] loop4: detected capacity change from 0 to 82800
[   11.014778] ecdsa_generic: unknown parameter 'ecdh' ignored
[   11.105187] nvidia: loading out-of-tree module taints kernel.
[   11.181361] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   11.181366] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[   11.182313] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   11.182314] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   11.182315] NVRM: No NVIDIA devices probed.
[   11.182553] nvidia-nvlink: Unregistered Nvlink Core, major device number 237

Also, in my lsmod, and I don't see nouveau, rivafb, nvidiafb or rivatv.

Tan-YiFan commented 9 months ago

If I encountered this problem, I would try:

  1. Booting a non-TDX VM with H100. The non-TDX VM would also use vfio to pass-through the GPU. This is to check whether the problem is due to vfio.
  2. Hacking into the nvidia driver. The related functions involve nv_pci_count_devices which is called by nvidia_init_module.
iihihiuh commented 9 months ago

I actually booted into a non-TDX VM and turned off the Nvidia CC for H100. In such a case, I am able to install the driver successfully. I also tried TDX VM without Nvidia CC for H100, where the driver installation is also unsuccessful.

Tan-YiFan commented 9 months ago

Could you give the result of lsmod in the TDX-VM with CC-enabled H100?

iihihiuh commented 9 months ago

Could you give the result of lsmod in the TDX-VM with CC-enabled H100?

Below is the lsmod output.

Module                  Size  Used by
binfmt_misc            24576  1
nls_iso8859_1          16384  1
ecdsa_generic          16384  0
drm_kms_helper        245760  0
syscopyarea            16384  1 drm_kms_helper
sysfillrect            20480  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
ecc                    45056  1 ecdsa_generic
sch_fq_codel           24576  2
dm_multipath           45056  0
scsi_dh_rdac           20480  0
scsi_dh_emc            16384  0
scsi_dh_alua           24576  0
drm                   684032  1 drm_kms_helper
efi_pstore             16384  0
ip_tables              36864  0
x_tables               65536  1 ip_tables
autofs4                53248  2
btrfs                1900544  0
blake2b_generic        20480  0
raid10                 73728  0
raid456               188416  0
async_raid6_recov      24576  1 raid456
async_memcpy           20480  2 raid456,async_raid6_recov
async_pq               24576  2 raid456,async_raid6_recov
async_xor              20480  3 async_pq,raid456,async_raid6_recov
async_tx               20480  5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov
xor                    24576  2 async_xor,btrfs
raid6_pq              126976  4 async_pq,btrfs,raid456,async_raid6_recov
libcrc32c              16384  2 btrfs,raid456
raid1                  57344  0
raid0                  24576  0
multipath              20480  0
linear                 20480  0
crct10dif_pclmul       16384  1
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
sha512_ssse3           53248  0
aesni_intel           397312  0
crypto_simd            20480  1 aesni_intel
cryptd                 28672  2 crypto_simd,ghash_clmulni_intel
ahci                   49152  0
libahci                57344  1 ahci
iihihiuh commented 8 months ago

Could you give the result of lsmod in the TDX-VM with CC-enabled H100?

Hi Tan-Yifan,

Would you mind letting me know your hardware system configurations? We could not make CC work with Intel chips, and we are thinking of shifting to AMD chips.

Thanks, Yongqin

Tan-YiFan commented 8 months ago

The CPU should support SEV-SNP. AMD 7003 and 9004 series would be ok.

Some previous issues successfully ran H100 CC on AMD servers. You can refer to them.

iihihiuh commented 8 months ago

@Tan-YiFan Hi Yifan, Is the AMD 7003 and 9004 series the only hardware requirement to enable Nvidia CC?

Tan-YiFan commented 8 months ago

The suggested motherboard manufacturer is Supermicro or ASRockRack. For further information, you can refer to https://docs.nvidia.com/confidential-computing-deployment-guide.pdf

rnertney commented 5 months ago

Can you please provide lscpu output for the Intel?

Did you utilize the instruction guides for GPU specific CVMs outlined here?

CPUs that are supported are those with Intel TDX and AMD SEV-SNP.

hedj17 commented 5 months ago

Have you sloved this problem?

NSKernel commented 2 months ago

I recently ran into this problem and I might have an answer for it. The issue is because NVIDIA H100 requires several huge DMA areas but the QEMU's implementation for the 'legacy' VFIO routine allocates these areas at the lowest granularity the memory backend supports, which, in TDX, is a 4 KiB page, causing massive amount of memory slots used.

I saw you are not using iommufd. If you can use iommufd, you should definitely use that which should not yield this problem. You can checkout the QEMU command in NVIDIA's official TDX confidential computing guide which uses iommufd.

If you are like me who has to use the legacy routine for whatever the reason it might be, a nasty workaround that works on my side is to hardcode the VFIO's DMA allocation granularity to a higher value (must be a power of 2 though). To do that, for the latest available patched QEMU from Intel's TDX (https://github.com/intel/tdx-linux/tree/device-passthrough), change line 417 of /hw/vfio/common.c (in function vfio_register_ram_discard_listener) into vrdl->granularity = 2097152; (or any number that a power of 2. Here I use 2 MiB). A diff patch is attached. It's definitely not an elegant solution but it works on my side. diff.txt