coconut-svsm / svsm

COCONUT-SVSM
MIT License
122 stars 43 forks source link

Add instruction decoding and emulating support for IOIO instructions #391

Closed cxdong closed 2 months ago

cxdong commented 5 months ago

Add the instruction decoding and emulating support for ins/outs instructions. To support emulating, the InsnMachineCtx has been extended with more methods for the instruction decoder to get CPU state. And implemented these new methods for the X86ExceptionContext.

Beside provides more CPU states, it also provides the memory accessing ability to the instruction decoder via the new trait InsnMachineMem. The PageTable has been extended to support translating virtual addresses to the corresponding physical addresses, with the accessing rights been checked. And the physical addresses are temporarily mapped in the page table with contiguous virtual addresses via a PerCPUPageMappingGuard, which is wrapped in the new structure MemMappingGuard. This new structure implements the trait InsnMachineMem and is used by the instruction decoder to access memory.

Besides emulating ins/outs instruction, in/out instruction can also be emulated. The test cases have been enhanced to cover the emulation for ins/outs/in/out instructions with byte, word and double word size.

In the end, the SEV #VC handler has been simplified by directly using the emulation interface to handle IOIO instructions. The test_in_svsm test_port_io_string_16_get_last() now is enabled. Has set up a TDX machine which can boot the SVSM in TD, so performed similar test as what test_port_io_string_16_get_last() did with TD, but didn't have a chance to test on with SEV due to lack of AMD machine. Appreciate it if someone can help to try this PR on an AMD machine.

00xc commented 4 months ago

Please rebase on main, thanks!

p4zuu commented 4 months ago

SVSM fails to boot in debug mode (RELEASE=0) starting from commit 8188603 "kernel/cpu/vc: Simplify IOIO instruction handling in #VC handler":

[Stage2] COCONUT Secure Virtual Machine Service Module (SVSM) Stage 2 Loader
[Stage2] ERROR: Panic: panicked at kernel/src/stage2.rs:353:14:
Failed to load kernel ELF: Mem

However, this boots fine (and test in SVSMs pass as well) on AMD for release builds.

cxdong commented 4 months ago

Please rebase on main, thanks!

Rebased and force updated.

cxdong commented 4 months ago

SVSM fails to boot in debug mode (RELEASE=0) starting from commit 8188603 "kernel/cpu/vc: Simplify IOIO instruction handling in #VC handler":

[Stage2] COCONUT Secure Virtual Machine Service Module (SVSM) Stage 2 Loader
[Stage2] ERROR: Panic: panicked at kernel/src/stage2.rs:353:14:
Failed to load kernel ELF: Mem

However, this boots fine (and test in SVSMs pass as well) on AMD for release builds.

Thanks for testing this on AMD machine!

I think this issue is related with the stage2.bin size which is increased quite a bit in DEBUG version by using the instruction decoder crate to emulate the in/out instructions, and results in the heap area becomes smaller. So in the latest update, that commit is refactored only for the SVSM #VC handler and leaving stage#2 #VC handler as before. The DEBUG version worked fine from my side on TDX machine because I made some hack to increase the stage2 boot memory larger than 640K, and the reason I increased this because stage2.bin size did become larger, which I thought was caused by introducing TDX specific code. But seems using the instruction decoder to emulate in/out instructions in stage2 also increased the size.

Thanks again for the testing! Please let me know if this issue can still be observed.

p4zuu commented 4 months ago

The #VC handler is panicking during one test-in-svsm:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: #VC handling error: Insn(ExceptionPF(18446743523954847744, 9))
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/idt/svsm.rs:216:13:
Failed to handle #VC from kernel-mode at RIP 0xffffff800005a482 code: 0x000000000000007b
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff80000ba051]
[SVSM]   [0xffffff8000000564]
[SVSM]   [0xffffff800009d28e]
[SVSM]   [0xffffff800009cfa2]
[SVSM]   [0xffffff800005a8d0]
[SVSM]   [0xffffff800009d261]
[SVSM]   [0xffffff8000080371]
[SVSM]   [0xffffff80000bdfb9]
[SVSM]   [0xffffff80000bdcb5]
[SVSM]   [0xffffff800007cbbf]
[SVSM]   Invalid frame
[SVSM] ---END---

I'm investigating this :)

cxdong commented 4 months ago

The #VC handler is panicking during one test-in-svsm:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: #VC handling error: Insn(ExceptionPF(18446743523954847744, 9))
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/idt/svsm.rs:216:13:
Failed to handle #VC from kernel-mode at RIP 0xffffff800005a482 code: 0x000000000000007b
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff80000ba051]
[SVSM]   [0xffffff8000000564]
[SVSM]   [0xffffff800009d28e]
[SVSM]   [0xffffff800009cfa2]
[SVSM]   [0xffffff800005a8d0]
[SVSM]   [0xffffff800009d261]
[SVSM]   [0xffffff8000080371]
[SVSM]   [0xffffff80000bdfb9]
[SVSM]   [0xffffff80000bdcb5]
[SVSM]   [0xffffff800007cbbf]
[SVSM]   Invalid frame
[SVSM] ---END---

I'm investigating this :)

Thank you for investigating this. This error represents svsm kernel is failed to translate a virtual address with checking permissions in the page table. So what I can image is that something might be wrong in the translate_addr_region() or the check_access_rights() functions. Please let me know if I can provide any help on this.

p4zuu commented 4 months ago

I think this issue is related with the stage2.bin size which is increased quite a bit in DEBUG version by using the instruction decoder crate to emulate the in/out instructions, and results in the heap area becomes smaller. So in the latest update, that commit is refactored only for the SVSM #VC handler and leaving stage#2 #VC handler as before. The DEBUG version worked fine from my side on TDX machine because I made some hack to increase the stage2 boot memory larger than 640K, and the reason I increased this because stage2.bin size did become larger, which I thought was caused by introducing TDX specific code. But seems using the instruction decoder to emulate in/out instructions in stage2 also increased the size.

Good catch, I also doubted it was a binary size issue, it already happened in the past. I'm not sure if we'll need INS/OUTS in stage 2, so enabling this only in the SVSM #VC handler makes sense to me. However, I think we'll need to really discuss this stage2 binary size issue with Jörg once he's back since this is a recurrent problem now

cxdong commented 4 months ago

The #VC handler is panicking during one test-in-svsm:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: #VC handling error: Insn(ExceptionPF(18446743523954847744, 9))
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/idt/svsm.rs:216:13:
Failed to handle #VC from kernel-mode at RIP 0xffffff800005a482 code: 0x000000000000007b
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff80000ba051]
[SVSM]   [0xffffff8000000564]
[SVSM]   [0xffffff800009d28e]
[SVSM]   [0xffffff800009cfa2]
[SVSM]   [0xffffff800005a8d0]
[SVSM]   [0xffffff800009d261]
[SVSM]   [0xffffff8000080371]
[SVSM]   [0xffffff80000bdfb9]
[SVSM]   [0xffffff80000bdcb5]
[SVSM]   [0xffffff800007cbbf]
[SVSM]   Invalid frame
[SVSM] ---END---

I'm investigating this :)

Thank you for investigating this. This error represents svsm kernel is failed to translate a virtual address with checking permissions in the page table. So what I can image is that something might be wrong in the translate_addr_region() or the check_access_rights() functions. Please let me know if I can provide any help on this.

Some hint from my side, according to the reported error code 9, PageFaultError::R is marked, which represents the code has detected some reserved bit has been set in the page table entry. There are 3 places in the code will set the error, but I guess the most likely place is the function "PTEntry::has_reserved_bits()". It takes PagingMode as an input, which should be PagingMode::PML4 for AMD SEV, which represents 4 level paging in long mode. Probably some logic inside this function is not correct.

p4zuu commented 3 months ago

In in-svsm testing, it hangs now before jumping to the kernel main:

[Stage2] COCONUT Secure Virtual Machine Service Module
[Stage2] Order-00: total pages:    15 free pages:     1
[Stage2] Order-01: total pages:     1 free pages:     1
[Stage2] Order-02: total pages:     0 free pages:     0
[Stage2] Order-03: total pages:     1 free pages:     1
[Stage2] Order-04: total pages:     0 free pages:     0
[Stage2] Order-05: total pages:     0 free pages:     0
[Stage2] Total memory: 100KiB free memory: 44KiB
[Stage2]   kernel_region_phys_start = 0x0000008000000000
[Stage2]   kernel_region_phys_end   = 0x0000008001000000
[Stage2]   kernel_virtual_base   = 0xffffff8000000000
*hang here*

I suspect a kvm panic that doesn't show on my screen. This doesn't happen in a regular (non in-svsm testing) SVSM boot

cxdong commented 3 months ago

In in-svsm testing, it hangs now before jumping to the kernel main:

[Stage2] COCONUT Secure Virtual Machine Service Module
[Stage2] Order-00: total pages:    15 free pages:     1
[Stage2] Order-01: total pages:     1 free pages:     1
[Stage2] Order-02: total pages:     0 free pages:     0
[Stage2] Order-03: total pages:     1 free pages:     1
[Stage2] Order-04: total pages:     0 free pages:     0
[Stage2] Order-05: total pages:     0 free pages:     0
[Stage2] Total memory: 100KiB free memory: 44KiB
[Stage2]   kernel_region_phys_start = 0x0000008000000000
[Stage2]   kernel_region_phys_end   = 0x0000008001000000
[Stage2]   kernel_virtual_base   = 0xffffff8000000000
*hang here*

I suspect a kvm panic that doesn't show on my screen. This doesn't happen in a regular (non in-svsm testing) SVSM boot

A possible reason may still be the stage2 binary size. If the size of stage2 is too large, the remaining memory for heap in stage2 might not be large enough to load svsm kernel. So, is it possible to try with "make test-in-svsm RELEASE=1"? Suppose the release version should give a smaller stage2 binary.

I also want to do some investigation about the in-svsm test failure, but seems I didn't encounter the hang issue. From my side, there is a panic error which is reasonable as the in-svsm test expects to run on SEV, not TDX.

[SVSM] 1 CPU(s) present
[SVSM] Brought 0 AP(s) online
[SVSM] [CPU 0] Virtual memory pages used: 0 * 4K, 0 * 2M
[SVSM] Launching request-processing task on CPU 0
[SVSM] running 128 tests
[SVSM] test cpu::tss::test::test_tss_offsets ...
[SVSM] test cpu::vc::tests::test_has_amd_cpuid ...
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/vc.rs:321:9:
assertion `left == right` failed
  left: Ok("GenuineIntel")
 right: Ok("AuthenticAMD")
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff800010a33e]
[SVSM]   [0xffffff8000019a9d]
[SVSM]   [0xffffff8000098d51]
[SVSM]   [0xffffff80000835c1]
[SVSM]   [0xffffff8000066891]
[SVSM]   [0xffffff8000056498]
[SVSM]   [0xffffff800002dd95]
[SVSM]   [0xffffff80000a8794]
[SVSM]   Invalid frame
[SVSM] ---END---

Maybe it is because TDX doesn't have #VC handler and didn't use insn_decode crate for stage2 (although SEV stage2 #VC handle doesn't use emulation API provided by insn_decode crate, but it is still using the decoding API), thus no significant impact to stage2 binary size. So probably it is worth to have a try with release version on SEV to see if there is any difference?

p4zuu commented 3 months ago

Maybe it is because TDX doesn't have #VC handler and didn't use insn_decode crate for stage2 (although SEV stage2 #VC handle doesn't use emulation API provided by insn_decode crate, but it is still using the decoding API), thus no significant impact to stage2 binary size. So probably it is worth to have a try with release version on SEV to see if there is any difference?

So this bug indeed doesn't show up with make test-in-svsm RELEASE=1 (weirdly, make test-in-svsm RELEASE=0 is also working) but it shows with make test-in-svsm. It appears that make test-in-svsm RELEASE=1 and make test-in-svsm RELEASE=0 both build in release mode. For this, maybe #432 can help. I'll work on it as well I'll continue investigating the bug in this comment in release mode

cxdong commented 3 months ago

Maybe it is because TDX doesn't have #VC handler and didn't use insn_decode crate for stage2 (although SEV stage2 #VC handle doesn't use emulation API provided by insn_decode crate, but it is still using the decoding API), thus no significant impact to stage2 binary size. So probably it is worth to have a try with release version on SEV to see if there is any difference?

So this bug indeed doesn't show up with make test-in-svsm RELEASE=1 (weirdly, make test-in-svsm RELEASE=0 is also working) but it shows with make test-in-svsm. It appears that make test-in-svsm RELEASE=1 and make test-in-svsm RELEASE=0 both build in release mode.

By looking into the compiling log of running "make test-in-svsm RELEASE=0", seems cargo still applies the parameter --release, which may be because the RELEASE is defined even in this case for the Makefile.

For this, maybe #432 can help. I'll work on it as well I'll continue investigating the bug in this comment in release mode

Thanks! Please let me know if I can provide any help, e.g., making any debugging patches

p4zuu commented 3 months ago

Thanks! Please let me know if I can provide any help, e.g., making any debugging patches

A few more hints here: PageFaultError::R is returned after entry.has_reserved_bits() returns a non-null value. This happens in check_access_write() in translate_addr_lvl0(). The faulty virtual address is 0xFFFFFF8000062000, the level 0 page table entry is 0x8008008000062161. In has_reserved_bits() we're correctly hitting the PagingMode::PML4, that returns common = 0xFFC0000000000, resulting in has_reserved_bits() returning ((0x8008008000062161 & 0xFFC0000000000) != 1) = true

cxdong commented 3 months ago

Thanks! Please let me know if I can provide any help, e.g., making any debugging patches

A few more hints here: PageFaultError::R is returned after entry.has_reserved_bits() returns a non-null value. This happens in check_access_write() in translate_addr_lvl0(). The faulty virtual address is 0xFFFFFF8000062000, the level 0 page table entry is 0x8008008000062161. In has_reserved_bits() we're correctly hitting the PagingMode::PML4, that returns common = 0xFFC0000000000, resulting in has_reserved_bits() returning ((0x8008008000062161 & 0xFFC0000000000) != 1) = true

Thanks so much for the investigation!

On TDX machine, the PHYS_ADDR_SIZE is 52 and the PML4 has reserved mask 0x0 for the level 0. As for SEV, if the reserved mask is 0xFFC0000000000 then seems the PHYS_ADDR_SIZE is not 52 but 42 which is not the actual maximal physical address size (containing the shared/private mask bit). If PHYS_ADDR_SIZE is not 52 for SEV, then probably the guest_phys_addr_size == 42, which leads the PHYS_ADDR_SIZE to be 42, the effective_phys_addr_size to be 42 and MAX_PHYS_ADDR to be 0x40000000000.

Based on this, if I hard-code PHYS_ADDR_SIZE as 42 for TDX, I can observe the same failure when translates for a shared memory (TDX has encryption bit mask set for shared memory, which is opposite with SEV). So did an update which caves out the encryption bit from the reserved bits mask for this case in the function has_reserved_bits. With this change, I didn't observe the issue anymore on TDX when hard-codes the PHYS_ADDR_SIZE as 42.

cxdong commented 3 months ago

Hi @p4zuu,

I noticed that PR #432 has been merged. When you have a moment, could you please retest this PR to check if the boot hang issue, as mentioned in the comment, has been resolved?

Additionally, this PR has been updated to address the test-in-svsm failure on SEV. While I was able to replicate the failure manually on a TDX machine and test the fix, I haven't had the opportunity to verify it on an actual SEV machine. Could you assist by running the test-in-svsm on SEV to confirm whether the issue is indeed fixed?

Thank you for your help!

p4zuu commented 3 months ago

Hi @p4zuu,

I noticed that PR #432 has been merged. When you have a moment, could you please retest this PR to check if the boot hang issue, as mentioned in the comment, has been resolved?

Yes the kernel starts now no matter release or debug mode, great :)

Additionally, this PR has been updated to address the test-in-svsm failure on SEV. While I was able to replicate the failure manually on a TDX machine and test the fix, I haven't had the opportunity to verify it on an actual SEV machine. Could you assist by running the test-in-svsm on SEV to confirm whether the issue is indeed fixed?

So now the first INS/OUTS test in SVSM is failing on SEV:

[SVSM] running 129 tests
[SVSM] test cpu::tss::test::test_tss_offsets ...
[SVSM] test cpu::vc::tests::test_has_amd_cpuid ...
[SVSM] test cpu::vc::tests::test_has_memory_encryption_info_cpuid ...
[SVSM] test cpu::vc::tests::test_mmio_apic_version ... ignored, apic mmio is not supported
[SVSM] test cpu::vc::tests::test_port_io_16 ...
[SVSM] test cpu::vc::tests::test_port_io_16_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_32 ...
[SVSM] test cpu::vc::tests::test_port_io_32_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_8 ...
[SVSM] test cpu::vc::tests::test_port_io_8_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/vc.rs:494:9:
assertion `left == right` failed
  left: 57072
 right: 41616
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff800004871f]
[SVSM]   [0xffffff8000011f7d]
[SVSM]   [0xffffff80000471b6]
[SVSM]   [0xffffff80000529d8]
[SVSM]   [0xffffff800001dc30]
[SVSM] ---END---

I'll take a deeper look :)

cxdong commented 3 months ago

Hi @p4zuu, I noticed that PR #432 has been merged. When you have a moment, could you please retest this PR to check if the boot hang issue, as mentioned in the comment, has been resolved?

Yes the kernel starts now no matter release or debug mode, great :)

Additionally, this PR has been updated to address the test-in-svsm failure on SEV. While I was able to replicate the failure manually on a TDX machine and test the fix, I haven't had the opportunity to verify it on an actual SEV machine. Could you assist by running the test-in-svsm on SEV to confirm whether the issue is indeed fixed?

So now the first INS/OUTS test in SVSM is failing on SEV:

[SVSM] running 129 tests
[SVSM] test cpu::tss::test::test_tss_offsets ...
[SVSM] test cpu::vc::tests::test_has_amd_cpuid ...
[SVSM] test cpu::vc::tests::test_has_memory_encryption_info_cpuid ...
[SVSM] test cpu::vc::tests::test_mmio_apic_version ... ignored, apic mmio is not supported
[SVSM] test cpu::vc::tests::test_port_io_16 ...
[SVSM] test cpu::vc::tests::test_port_io_16_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_32 ...
[SVSM] test cpu::vc::tests::test_port_io_32_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_8 ...
[SVSM] test cpu::vc::tests::test_port_io_8_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/vc.rs:494:9:
assertion `left == right` failed
  left: 57072
 right: 41616
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff800004871f]
[SVSM]   [0xffffff8000011f7d]
[SVSM]   [0xffffff80000471b6]
[SVSM]   [0xffffff80000529d8]
[SVSM]   [0xffffff800001dc30]
[SVSM] ---END---

I'll take a deeper look :)

Thanks a lot for the testing. I can cook a debug patch to capture more logs to understand the root cause.

cxdong commented 3 months ago

So now the first INS/OUTS test in SVSM is failing on SEV:

[SVSM] running 129 tests
[SVSM] test cpu::tss::test::test_tss_offsets ...
[SVSM] test cpu::vc::tests::test_has_amd_cpuid ...
[SVSM] test cpu::vc::tests::test_has_memory_encryption_info_cpuid ...
[SVSM] test cpu::vc::tests::test_mmio_apic_version ... ignored, apic mmio is not supported
[SVSM] test cpu::vc::tests::test_port_io_16 ...
[SVSM] test cpu::vc::tests::test_port_io_16_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_32 ...
[SVSM] test cpu::vc::tests::test_port_io_32_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_8 ...
[SVSM] test cpu::vc::tests::test_port_io_8_hardcoded ...
[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/vc.rs:494:9:
assertion `left == right` failed
  left: 57072
 right: 41616
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff800004871f]
[SVSM]   [0xffffff8000011f7d]
[SVSM]   [0xffffff80000471b6]
[SVSM]   [0xffffff80000529d8]
[SVSM]   [0xffffff800001dc30]
[SVSM] ---END---

I'll take a deeper look :)

Thanks a lot for the testing. I can cook a debug patch to capture more logs to understand the root cause.

Just cooked a debug patch and pushed on top of this PR. Hope it can capture enough logs for analyzing the root cause.

p4zuu commented 3 months ago

Just cooked a debug patch and pushed on top of this PR. Hope it can capture enough logs for analyzing the root cause.

I drop here the whole failing test debug trace without having found the root cause unfortunately:

[SVSM] test cpu::vc::tests::test_port_io_string_16_get_last ...
[SVSM] rep_outsw: data address: 0xffffff8000066aa8
[SVSM] Decode bytes: [
    0xf3,
    0x66,
    0x6f,
    0x48,
    0x8b,
    0x3,
    0x48,
    0x85,
    0xc0,
    0xf,
    0x84,
    0x2c,
    0x2,
    0x0,
    0x0,
]
[SVSM] decode_insn: Outs: repeat 4
[SVSM] cal_linear_addr in 64bit mode, segment 0xcf93000000ffff ea 0xffffff8000066aa8 addrsize.mask 0xffffffffffffffff
[SVSM] translate_addr_region: va 0xffffff8000066aa8 len 2 -> [
    (
        PhysAddr(
            0x8000066000,
        ),
        true,
    ),
]
[SVSM] map_linear_addr: MemMappingGuard {
    guard: PerCPUPageMappingGuard {
        mapping: MemoryRegion {
            start: VirtAddr(
                0xffffff0040000000,
            ),
            end: VirtAddr(
                0xffffff0040001000,
            ),
        },
        huge: false,
    },
    start_off: 0xaa8,
    phantom: PhantomData<u16>,
}
[SVSM] MemMappingGuard_read: region MemoryRegion {
    start: VirtAddr(
        0xffffff0040000aa8,
    ),
    end: VirtAddr(
        0xffffff0040000aaa,
    ),
} offset: 0, data: 0xe8be
[SVSM] translate_single_addr: va 0xffffff0040000aa8 -> Ok(
    (
        PhysAddr(
            0x8000066aa8,
        ),
        true,
        0x1000,
    ),
)
[SVSM] emulate_in_outs: io_write port 0xe0 opsize Two data 0xe8be, seg DS reg Rsi linear_addr 0xffffff8000066aa8
[SVSM] ioio_out: port 0xe0, size Two, data 0xe8be
[SVSM] emulate_in_outs: increase reg Rsi by Two
[SVSM] emulate_in_outs: update repeat count by 3
[SVSM] Decode bytes: [
    0xf3,
    0x66,
    0x6f,
    0x48,
    0x8b,
    0x3,
    0x48,
    0x85,
    0xc0,
    0xf,
    0x84,
    0x2c,
    0x2,
    0x0,
    0x0,
]
[SVSM] decode_insn: Outs: repeat 3
[SVSM] cal_linear_addr in 64bit mode, segment 0xcf93000000ffff ea 0xffffff8000066aaa addrsize.mask 0xffffffffffffffff
[SVSM] translate_addr_region: va 0xffffff8000066aaa len 2 -> [
    (
        PhysAddr(
            0x8000066000,
        ),
        true,
    ),
]
[SVSM] map_linear_addr: MemMappingGuard {
    guard: PerCPUPageMappingGuard {
        mapping: MemoryRegion {
            start: VirtAddr(
                0xffffff0040000000,
            ),
            end: VirtAddr(
                0xffffff0040001000,
            ),
        },
        huge: false,
    },
    start_off: 0xaaa,
    phantom: PhantomData<u16>,
}
[SVSM] MemMappingGuard_read: region MemoryRegion {
    start: VirtAddr(
        0xffffff0040000aaa,
    ),
    end: VirtAddr(
        0xffffff0040000aac,
    ),
} offset: 0, data: 0x9e7f
[SVSM] translate_single_addr: va 0xffffff0040000aaa -> Ok(
    (
        PhysAddr(
            0x8000066aaa,
        ),
        true,
        0x1000,
    ),
)
[SVSM] emulate_in_outs: io_write port 0xe0 opsize Two data 0x9e7f, seg DS reg Rsi linear_addr 0xffffff8000066aaa
[SVSM] ioio_out: port 0xe0, size Two, data 0x9e7f
[SVSM] emulate_in_outs: increase reg Rsi by Two
[SVSM] emulate_in_outs: update repeat count by 2
[SVSM] Decode bytes: [
    0xf3,
    0x66,
    0x6f,
    0x48,
    0x8b,
    0x3,
    0x48,
    0x85,
    0xc0,
    0xf,
    0x84,
    0x2c,
    0x2,
    0x0,
    0x0,
]
[SVSM] decode_insn: Outs: repeat 2
[SVSM] cal_linear_addr in 64bit mode, segment 0xcf93000000ffff ea 0xffffff8000066aac addrsize.mask 0xffffffffffffffff
[SVSM] translate_addr_region: va 0xffffff8000066aac len 2 -> [
    (
        PhysAddr(
            0x8000066000,
        ),
        true,
    ),
]
[SVSM] map_linear_addr: MemMappingGuard {
    guard: PerCPUPageMappingGuard {
        mapping: MemoryRegion {
            start: VirtAddr(
                0xffffff0040000000,
            ),
            end: VirtAddr(
                0xffffff0040001000,
            ),
        },
        huge: false,
    },
    start_off: 0xaac,
    phantom: PhantomData<u16>,
}
[SVSM] MemMappingGuard_read: region MemoryRegion {
    start: VirtAddr(
        0xffffff0040000aac,
    ),
    end: VirtAddr(
        0xffffff0040000aae,
    ),
} offset: 0, data: 0x2b2
[SVSM] translate_single_addr: va 0xffffff0040000aac -> Ok(
    (
        PhysAddr(
            0x8000066aac,
        ),
        true,
        0x1000,
    ),
)
[SVSM] emulate_in_outs: io_write port 0xe0 opsize Two data 0x2b2, seg DS reg Rsi linear_addr 0xffffff8000066aac
[SVSM] ioio_out: port 0xe0, size Two, data 0x2b2
[SVSM] emulate_in_outs: increase reg Rsi by Two
[SVSM] emulate_in_outs: update repeat count by 1
[SVSM] Decode bytes: [
    0xf3,
    0x66,
    0x6f,
    0x48,
    0x8b,
    0x3,
    0x48,
    0x85,
    0xc0,
    0xf,
    0x84,
    0x2c,
    0x2,
    0x0,
    0x0,
]
[SVSM] decode_insn: Outs: repeat 1
[SVSM] cal_linear_addr in 64bit mode, segment 0xcf93000000ffff ea 0xffffff8000066aae addrsize.mask 0xffffffffffffffff
[SVSM] translate_addr_region: va 0xffffff8000066aae len 2 -> [
    (
        PhysAddr(
            0x8000066000,
        ),
        true,
    ),
]
[SVSM] map_linear_addr: MemMappingGuard {
    guard: PerCPUPageMappingGuard {
        mapping: MemoryRegion {
            start: VirtAddr(
                0xffffff0040000000,
            ),
            end: VirtAddr(
                0xffffff0040001000,
            ),
        },
        huge: false,
    },
    start_off: 0xaae,
    phantom: PhantomData<u16>,
}
[SVSM] MemMappingGuard_read: region MemoryRegion {
    start: VirtAddr(
        0xffffff0040000aae,
    ),
    end: VirtAddr(
        0xffffff0040000ab0,
    ),
} offset: 0, data: 0xaff6
[SVSM] translate_single_addr: va 0xffffff0040000aae -> Ok(
    (
        PhysAddr(
            0x8000066aae,
        ),
        true,
        0x1000,
    ),
)
[SVSM] emulate_in_outs: io_write port 0xe0 opsize Two data 0xaff6, seg DS reg Rsi linear_addr 0xffffff8000066aae
[SVSM] ioio_out: port 0xe0, size Two, data 0xaff6
[SVSM] emulate_in_outs: increase reg Rsi by Two
[SVSM] emulate_in_outs: update repeat count by 0
[SVSM] Decode bytes: [
    0x66,
    0xed,
    0x48,
    0x8b,
    0xb,
    0x48,
    0x85,
    0xc9,
    0xf,
    0x84,
    0xc2,
    0x1,
    0x0,
    0x0,
    0x31,
]
[SVSM] ioio_in: port 0xe0, size Two, data 0xaff6
[SVSM] emulate_in_out: io_read port 0xe0 opsize Two data 0xaff6
[SVSM] ERROR: Panic: CPU[0] panicked at kernel/src/cpu/vc.rs:496:9:
assertion `left == right` failed
  left: 57072
 right: 45046
[SVSM] ---BACKTRACE---:
[SVSM]   [0xffffff800005258f]
[SVSM]   [0xffffff8000030d15]
[SVSM]   [0xffffff800000b2c6]
[SVSM]   [0xffffff80000475c8]
[SVSM]   Invalid frame
[SVSM] ---END---

As you can see, the data read in MemMappingGuard_read are wrong (for the 4 u32 got in testing port): [0xe8be, 0x9e7f, 0x2b2, 0xaff6]. I guess there is something wrong maybe in address translation, or maybe the MemMappingGuard (through PerCPUPageMappingGuard::create_4k_pages() is wrong?

cxdong commented 3 months ago

As you can see, the data read in MemMappingGuard_read are wrong (for the 4 u32 got in testing port): [0xe8be, 0x9e7f, 0x2b2, 0xaff6].

Yes, correct.

I guess there is something wrong maybe in address translation, or maybe the MemMappingGuard (through PerCPUPageMappingGuard::create_4k_pages() is wrong?

The data should be in the .rodata section of the svsm kernel ELF. The virtual address of the data is 0xffffff8000066aa8, which is printed in the rep_outsw function. This is the original address value passed via rsi register. The instruction decoder has decoded the same address value, so the decoding part is fine.

[SVSM] translate_addr_region: va 0xffffff8000066aa8 len 2

This decoded virtual address is translated to physical address 0x8000066000 with the start offset 0xaa8.

As my knowledge on TDX, the svsm kernel ELF is mapped starting from physical address 0x8000000000, and the virtual address starts from 0xffffff8000000000. And each ELF section is mapped contiguously. On my TDX machine, the .rodata section physical address is the virtual address minus the offset 0xffffff0000000000. So based on this, the virtual address 0xffffff8000066aa8 should be mapped to physical address 0x8000066aa8, which is the same with translation result. From this point of view, the address translation is also correct. But we can double-check this by dumping the section headers.

For the MemMappingGuard, the new virtual address is 0xffffff0040000aa8. And this virtual address can be translated to the expected physical address as below, which looks also correct.

[SVSM] translate_single_addr: va 0xffffff0040000aa8 -> Ok(
    (
        PhysAddr(
            0x8000066aa8,
        ),
        true,
        0x1000,
    ),
)

The test_port_io_string_16_get_last with striping off SEV functions can run successfully on TDX. Not sure if the failure on SEV is still related with the private/shared bit, as this part is different between SEV and TDX.

Another suspect is about the data value in the address 0xffffff8000066aa8. This can be confirmed by dump the .rodata section in the test-kernel.elf. I did this from my side by #objdump -s -j .rodata bin/test-kernel.elf. The data virtual address from my side is 0xffffff80001c794a, and the .rodata section shows the data at 0xffffff80001c794a is 0x1234, which is expected.

ffffff80001c7940 5f626974 6d61702e 72733412 7856bc9a
ffffff80001c7950 f0de6e6f 7420696d 706c656d 656e7465

So could you please help to dump the .rodata section to see if the value in address 0xffffff8000066aa8 is expected? As well as dumping the section headers by #readelf -a bin/test-kernel.elf since it may be helpful to double-check if the address translation is correct or not?

cxdong commented 3 months ago

I pointed out the bug in my review. As a general note I question the need for re-mapping already mapped memory for the sake of instruction emulation in a VC/VE handler. For these use-cases the code can just access the virtual addresses directly using a pointer guard like GuestPtr.

The consideration is to make this instruction decoder general for both the SVSM kernel and the guest (SEV-SNP or TDP). If the instruction is from guest, the decoded linear address is mapped in guest's page table and cannot be directly used by the svsm kernel. From another side, the guest memory region represented by the linear address may cross pages, which may not be contiguous. Based on these, translate the guest memory region to a set of physical addresses, and then re-map.

Just realized another thing, the guest memory is not statically mapped in svsm kernel's page table, right? Then seems we need to do the re-map anyway because guest memory doesn't have an accessible virtual address in the svsm kernel.

joergroedel commented 3 months ago

The consideration is to make this instruction decoder general for both the SVSM kernel and the guest (SEV-SNP or TDP). If the instruction is from guest, the decoded linear address is mapped in guest's page table and cannot be directly used by the svsm kernel. From another side, the guest memory region represented by the linear address may cross pages, which may not be contiguous. Based on these, translate the guest memory region to a set of physical addresses, and then re-map.

I understand and agree with the general need for these abstractions. You are right that for decoding and emulating instructions from the guest they are absolutely necessary. But for emulation in VE/VC handlers there can be a much simpler abstraction which accesses the virtual addresses directly (protected by an exception guard).

Just realized another thing, the guest memory is not statically mapped in svsm kernel's page table, right? Then seems we need to do the re-map anyway because guest memory doesn't have an accessible virtual address in the svsm kernel.

Yes, guest-memory accesses always need the re-map. Note that this also means guest page-table traversals.

cxdong commented 3 months ago

I understand and agree with the general need for these abstractions. You are right that for decoding and emulating instructions from the guest they are absolutely necessary. But for emulation in VE/VC handlers there can be a much simpler abstraction which accesses the virtual addresses directly (protected by an exception guard).

Just a quick question, will there be VE/VC caused by SVSM user mode in future? The reason I ask because for the VE/VC generated by the user mode, it looks like we still need to do the user page-table traversal. And as the user mode memory is not guaranteed to be contiguous, the user mode virtual address may cross pages, so also need a re-mapping.

As currently there is no VE/VC caused by SVSM user mode, as well as no need to decode/emulate for guest, I can remove the page-table traversal and remapping code from this PR, and leverage the GuestPtr to emulate VE/VC generated by the SVSM kernel only. Please let me know if this is a correct understanding or not.

joergroedel commented 3 months ago

Just a quick question, will there be VE/VC caused by SVSM user mode in future? The reason I ask because for the VE/VC generated by the user mode, it looks like we still need to do the user page-table traversal. And as the user mode memory is not guaranteed to be contiguous, the user mode virtual address may cross pages, so also need a re-mapping.

User-mode will cause VE/VC exceptions, but there is no need to support IOIO/MMIO from user-space for now. If that ever changes the user-mode virtual addresses can still be accessed directly with a guard, there just need to be some extra checks (e.g. whether the decoded instruction only targets user memory and whether the exception reason matches the instruction).

As currently there is no VE/VC caused by SVSM user mode, as well as no need to decode/emulate for guest, I can remove the page-table traversal and remapping code from this PR, and leverage the GuestPtr to emulate VE/VC generated by the SVSM kernel only. Please let me know if this is a correct understanding or not.

You can leave it there for now, it will be needed for decoding guest instructions in the not too distant future. Just use another abstraction for VE/VC handlers.

cxdong commented 3 months ago

You can leave it there for now, it will be needed for decoding guest instructions in the not too distant future. Just use another abstraction for VE/VC handlers.

Sure. Thanks for the suggestion! I will refactor for the VE/VC handlers.

cxdong commented 3 months ago

Hi @joergroedel, this PR has been rebased and updated with adding a simpler abstraction for VC/VE handler. The PTE attribute checking for the page table traversal and the guard for memory re-mapping are left there, and the virtual address translation in PageTable are removed from this PR. The considerations are:

  1. For guest, we need guest page table traversal, but seems cannot simply reuse the PageTable to represent a guest page table.
  2. To emulate MMIO VC/VE generated by COCONUT-SVSM kernel, decoder will require translating a MMIO virtual address to the MMIO physical address, to do the MMIO emulation accordingly. But if the MMIO will be mapped in the page table like memory, probably the translating is just a virt_to_phys() for the kernel MMIO VC/VE.

So the virtual address translation code in PageTable is removed in this update. Please let me know if you have any concern. Thanks!