TrenchBoot / xen

Other
0 stars 0 forks source link

Xen is booting very slowly after Dynamic Launch #16

Open miczyg1 opened 1 month ago

miczyg1 commented 1 month ago

Right after the memory map is printed (quite early), Xen doesn't print anything on serial console for 1 minute or longer. I suspect it might be related to the MTRRs not yet being set up on BSP after Dynamic Launch.

miczyg1 commented 1 month ago
(XEN) Xen version 4.17.4 (user@[unknown]) (gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)) debug=n Thu May 23 22:18:23 UTC 2024
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GRUB 2.06
(XEN) Command line: placeholder console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096
(XEN) Xen image load base address: 0x98a00000
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 0 MBR signatures
(XEN)  Found 3 EDD information structures
(XEN) SLAUNCH: reserving event log (0x99411000 - 0x99419000)
(XEN) SLAUNCH: reserving TXT heap (0x99f10000 - 0x9a000000)
(XEN) SLAUNCH: reserving SINIT memory (0x99ec0000 - 0x99f10000)
(XEN) Xen-e820 RAM map:
(XEN)  [0000000000000000, 000000000009fbff] (usable)
(XEN)  [000000000009fc00, 000000000009ffff] (reserved)
(XEN)  [00000000000f0000, 00000000000fffff] (reserved)
(XEN)  [0000000000100000, 0000000099410fff] (usable)
(XEN)  [0000000099411000, 0000000099418fff] (reserved)
(XEN)  [0000000099419000, 0000000099438fff] (usable)
(XEN)  [0000000099439000, 000000009f7fffff] (reserved)
(XEN)  [00000000e0000000, 00000000efffffff] (reserved)
(XEN)  [00000000fc000000, 00000000fc000fff] (reserved)
(XEN)  [00000000fe000000, 00000000fe00ffff] (reserved)
(XEN)  [00000000fed10000, 00000000fed17fff] (reserved)
(XEN)  [00000000fed20000, 00000000fed91fff] (reserved)
(XEN)  [00000000feda0000, 00000000feda1fff] (reserved)
(XEN)  [00000000ff000000, 0000000100000fff] (reserved)
(XEN)  [0000000100001000, 000000045e7fffff] (usable)
(this is where it stops for a longer time) <----------------------------
(XEN) ACPI: RSDP 000F6D50, 0024 (r2 COREv4)
(XEN) ACPI: XSDT 9946D0E0, 006C (r1 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: FACP 9946F5D0, 0114 (r6 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: DSDT 9946D280, 234C (r2 COREv4 COREBOOT 20110725 INTL 20230628)
(XEN) ACPI: FACS 9946D240, 0040
(XEN) ACPI: SSDT 9946F6F0, 2974 (r2 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: MCFG 99472070, 003C (r1 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: TPM2 994720B0, 004C (r4 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: LPIT 99472100, 0094 (r0 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: APIC 994721A0, 00B2 (r3 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: SPCR 99472260, 0058 (r4 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: DMAR 994722C0, 0088 (r1 COREv4 COREBOOT        0 CORE 20230628)
(XEN) ACPI: HPET 99472350, 0038 (r1 COREv4 COREBOOT        0 CORE 20230628)
krystian-hebel commented 1 month ago

Can you run with e820-verbose=true passed to Xen and paste the log? MTRRs are restored just before e820 map is printed, so unless wrong memory types are saved by GRUB, it should already be using cache at this point.

pietrushnic commented 1 month ago

Does it make sense to call @andyhhp ?

andyhhp commented 1 month ago

Does it make sense to call @andyhhp ?

Lol, who you gonna call?

debug=n

Please always use debug builds of Xen. Also, boot with console_timestamps=boot to get some Linux-style numbers here, and cpuinfo mtrr.show to get some extra diagnostics that may be relevant.

Which platform is this? Something Intel, but once again I'm missing an expected printk() in there.

In that period of time is when we're physically relocating the multiboot modules, and in particular putting dom0/initrd at the very top of memory. If that's ending up being uncached, then yes it will proceed slowly. This ought to make it obvious, if it's this.

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index dd51e68dbe5b..4299641e8a71 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1457,6 +1457,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
                  (headroom ||
                   ((end - size) >> PAGE_SHIFT) > mod[j].mod_start) )
             {
+                printk("*** Relocating mod[%d]\n", j);
                 move_memory(end - size + headroom,
                             (uint64_t)mod[j].mod_start << PAGE_SHIFT,
                             mod[j].mod_end);

but there are various other things we're doing during this period, including building pagetables, setting up the physical memory manager, etc.

rossphilipson commented 1 month ago

That does sound like MTRRs not being restored. The entire world is UC except for the ACM area on SENTER.

krystian-hebel commented 1 month ago

Isn't this the platform on which coreboot complains about lack of free MTRRs for ROM? If it gives up in the middle of writing those and doesn't set something sane for >4GB, perhaps wrong settings are properly restored.

miczyg1 commented 1 month ago

Isn't this the platform on which coreboot complains about lack of free MTRRs for ROM? If it gives up in the middle of writing those and doesn't set something sane for >4GB, perhaps wrong settings are properly restored.

I have commented out the part where a temporary MTRR is placed, so MTRRs should be alright.

Which platform is this? Something Intel, but once again I'm missing an expected printk() in there.

Intel Comet Lake U

In that period of time is when we're physically relocating the multiboot modules, and in particular putting dom0/initrd at the very top of memory. If that's ending up being uncached, then yes it will proceed slowly. This ought to make it obvious, if it's this.

Okay, but on the non-Dynamic Launch path, it is not taking that long (I get dom0 systemd messages/LUKS key prompt on the serial console in a matter of seconds compared to over 60 seconds on Dynamic Launch).