Dasharo / dasharo-issues

The Dasharo issue tracker
https://dasharo.com/
24 stars 0 forks source link

Coreboot, SMM (System Management Mode) and performance differences #82

Open zirblazer opened 2 years ago

zirblazer commented 2 years ago

I have been trying to look around at what makes technically possible for Coreboot derivatives to be potentially faster than standard propietary Firmwares (Assuming comparisons are performed at same clocks speeds/power limits, so that you don't get into cheating with higher clocks territory. This is very important because otherwise the Firmware itself isn't "faster", it is just slighty overclocking the rest of the platform by default).

What I found out is that using SMM (System Management Mode) could play perhaps the biggest role, since code executed on SMM amounts for unaccounted clock cycles that the OS has no idea about, which of course means that you're losing performance without even being aware of it. It can also cause inconsistent latency, since SMIs (System Management Interrupt) has the highest priority, also interrupting transparently from the OS, which is perhaps even more important than over-time thoughput. So far, the only feature that I know about that relies on SMM is emulation of PS/2 Keyboard and Mouse devices, that was important to install Windows 7 without built-in USB XHCI Controller Drivers on platforms with no EHCI, like Intel consumer platforms since Skylake. I have no idea if just leaving it enable costs performance. Also, ACPI implementations can certainly abuse SMM (Albeit I recall a few talks about implementing ACPI with no SMM, but no idea what is the current status of that on Coreboot), and that means that performance may be lost due to expensive context switches.

In resume, poorly implemented or unoptimized features that relies on SMM may cost latency and performance. Do we know if there are any specific Coreboot functions that relies on SMM, how often propietary Firmware uses it, or anything that can explain benchmarks differences?

pietrushnic commented 2 years ago

There is nothing in coreboot that relies specifically on SMM. There can be some features which are easier to implment using SMM, that's why silicon vendors call SMM a space for adding value by OEMs. AFAIK there are some SMI handlers installed by FSP, so depending on microarchitecture it may vary.

I agree this is interesting space to hunt for performance gains. We will try to allocate budget for researching the topic. Key thing are most probably tools for measuring lost CPU cycles on vendor BIOS vs open-source firmware.

@miczyg1 @krystian-hebel any thoughts?

bdelgado1995 commented 1 year ago

One place to start is reading MSR 0x34 (SMI counter) periodically over time. If the counter doesn't increase, then there is no SMI time loss to be concerned about during that interval. On some platforms I've seen, there are zero or very few SMIs at OS runtime while other platforms can have SMIs occurring a few times a second. Total SMM time consumed is based on the number of SMIs multiplied by SMI duration. Some SMIs are longer while some are shorter as well so it depends on the specific SMI type. (Note: the MSR counter doesn't increment if there's an STM enabled.)

pietrushnic commented 1 year ago

@bdelgado1995 thanks for the comment and welcome in Dasharo Community. Your experience is very interesting. What was the motivation from your side to check SMIs? We are very interested about workloads Dasharo could be useful in since that can contribute to growing community and over time build better business model.

miczyg1 commented 1 year ago

AFAIK there are some SMI handlers installed by FSP, so depending on microarchitecture it may vary.

FSP does not install SMM handlers (yet). coreboot is fully responsible for setting up SMI handlers.

Dasharo uses APMC SMM handler for SMMSTORE (writing to flash, used for UEFI variables)., switching to ACPI mode on/off.

There are also some standard handlers responsible for power:

  1. Sleep SMI: executed only when software writes to ACPI space to shut down the machine).
  2. PM1 SMI: when the power button is pressed before OS is loaded to shut down the machine.

And for the MSI board since v.1.1.0:

  1. Periodic SMI is also used to reload watchdog
  2. TCO SMI is used for SMM BIOS Write Protection.

These are active SMM handlers. Other handlers (GPE0_STS, GPIO, ESPI, MONITOR_STS, MCSMI aka EC) are most likely not triggered (no GPIO is configured for SMI, no EC and ESPI SMI configured).

bdelgado1995 commented 1 year ago

@bdelgado1995 thanks for the comment and welcome in Dasharo Community. Your experience is very interesting. What was the motivation from your side to check SMIs? We are very interested about workloads Dasharo could be useful in since that can contribute to growing community and over time build better business model.

I've worked at Intel on some SMM/STM-related projects previously and also did my graduate studies in the area of SMM-based dynamic detection of Linux/Xen rootkits. I co-authored the paper mentioned in the first posting in this thread ("Performance Implications of System Management Mode") I got interested in SMM performance as our university project was using SMM to detect OS/VMM rootkits but the inspection time comes at the cost of OS/VMM performance so we came up with ways to mitigate that. Roughly, the approach decomposes large hash operations / measurements into smaller components that can better fit within guidelines for the maximum amount of time spent in a single SMI (e.g. ~150 microseconds.) Previous academic approaches were taking 20-40 milliseconds which would very likely negatively impact the user experience.

For me, the primary interest in what you're doing is helping solve the issue of getting reasonably-priced modern boards with user-modifiable firmware out in the market. I think the effort would be useful for university prototypes and those who want to prototype/build-on open source firmware. For a while, I was using Intel Minnowboards which had the following useful properties: open-source firmware, easy to flash with Dediprog SPI header, serial port output, and optional JTAG support. But, the Minnowboard was limited to older Atom CPUs, 2 GB of RAM, and lacked easy use of PCI-E cards so your efforts in enabling the Z690 are very helpful for projects with higher h/w requirements.

miczyg1 commented 1 year ago

But, the Minnowboard was limited to older Atom CPUs, 2 GB of RAM, and lacked easy use of PCI-E cards so your efforts in enabling the Z690 are very helpful for projects with higher h/w requirements.

Glad to hear. @bdelgado1995 if you see any use-cases or features that could help with academic research, please let us know. We may consider implementing it in one of the future releases or add it to our backlog of feature requests.

pietrushnic commented 1 year ago

I've worked at Intel on some SMM/STM-related projects previously and also did my graduate studies in the area of SMM-based dynamic detection of Linux/Xen rootkits. I co-authored the paper mentioned in the first posting in this thread ("Performance Implications of System Management Mode") I got interested in SMM performance as our university project was using SMM to detect OS/VMM rootkits but the inspection time comes at the cost of OS/VMM performance so we came up with ways to mitigate that. Roughly, the approach decomposes large hash operations / measurements into smaller components that can better fit within guidelines for the maximum amount of time spent in a single SMI (e.g. ~150 microseconds.) Previous academic approaches were taking 20-40 milliseconds which would very likely negatively impact the user experience.

IIUC this PSEC presentation is yours. Congratulations. We also huge fans of PSEC and attended in 2019.

For me, the primary interest in what you're doing is helping solve the issue of getting reasonably-priced modern boards with user-modifiable firmware out in the market. I think the effort would be useful for university prototypes and those who want to prototype/build-on open source firmware. For a while, I was using Intel Minnowboards which had the following useful properties: open-source firmware, easy to flash with Dediprog SPI header, serial port output, and optional JTAG support. But, the Minnowboard was limited to older Atom CPUs, 2 GB of RAM, and lacked easy use of PCI-E cards so your efforts in enabling the Z690 are very helpful for projects with higher h/w requirements.

Please feel invited to join Dasharo community. W also plan Dasharo Community Calls and virtual hackathons/live coding in 2023, so it would be great if you could join, so we could discuss SMM/STM/PPAM related topics. I guess Qubes OS community could be interested with it also - we plan Qubes OS summit in 2023, most probably in Berlin. We also work on courses on OST2 where we definitely want to educate security community about SMM and potentially other "TEE". So there are many ways we could cooperate to bring you and your organization more value, just let us know what would work best for you.

Also if there is anything we can do to improve Dasharo and supported hardware to make it more useful for prototyping please let us know. At this point I wonder how the selection process for prototyping hardware, at typical university, look like? Maybe we can do something to make it easier?

bdelgado1995 commented 1 year ago

Platform Security Summit 2019 was excellent, would love to attend another one. Yes, this presentation is the one I gave there for EPA-RIMM, the academic project that looked for rootkits in the OS/VMM from SMM. That's excellent that you were there too. Thanks for the invite, I'll see how I can participate. I am also quite involved in UEFI fuzzing so would be good to talk about that sometime as well, I saw an open issue on that.

I'll definitely let you know if I see something that would make the hardware more useful for prototyping. I got one of the boards and was able to follow your procedure to build/flash the firmware on it. Very smooth process! I am hoping to use the board for a future paper and also try out the STM on it.

I think one common model of university purchases is a professor getting a grant for a given project, identifying the set of equipment required (systems, software, debug equipment, licenses, etc) and getting it funded. It could be helpful to list out the customization capabilities that are enabled with open firmware so those projects that are focusing on certain aspects can see what's available and map it to their needs.