Open kiram9 opened 1 year ago
Thanks for this. Yeah, I have been worried about exactly this happening.
I've got a couple ideas and notes about what we could and sould do to work around it.
M001
et al (insufficient)Proposal: perhaps the driver could call EC0.M001
through M005
to transact with the EC.
Unfortunately, we need locking around the entire packet. The individual read/write primitives lock around the individual operation. Since a packet is composed of up to 256 individual reads or writes, that leaves 256 sequence points between which other ACPI functions could take the lock and corrupt an in-flight exchange.
Proposal: Add a new node with a new _DSM
that allows us to lock/unlock ECMT
from the OS.
I actually tried this one!
I introduced a new sub-node of EC0
named ECPR
("EC protocol")¹; it was given a _HID
of FRMW0003
and support for a
new _DSM
.
DSM UUID: 8829106f-5320-44e4-8f20-fa67326a71eb
Function IDs
ID | Description |
---|---|
0 | Validity mask |
1 | Lock ECMT |
2 | Unlock ECMT |
It won't work because according to the ACPI spec 6.4, section 19.6.88, "ownership of a Mutex must be relinquished before completion of any invocation."
I believe it would be possible to move the transmission/receipt of an entire packet down into a DSM. I have not yet tested this, but plan to do so this weekend (1-14 - 1-16 as of writing).
We could send a buffer of up to 256 bytes in the fourth argument package to that new DSM, and it could return a package
containing an integer length and an up-to-256-byte buffer in response. Everything could happen under ECMT
.
It might still be valuable to expose a separate node, see footnote 1.
This would make this driver effectively Framework-specific, and would of course require a firmware revision. :smile:
I just noticed that as of TGL 3.17, the UCSI methods in SSDT 8 also lack sufficient locking around mutex ECMT
!
¹ I think it would be beneficial to do this anyway, as it would remove the need for creating a virtual device
ROOT\CrosEC
and the driver INF file could just bind the ACPI HID directly. It allows for automatic installation and it
removes complexity from the driver wherein it needs to look up its constituent ACPI nodes from the root.
Miraculously, it worked! There's a lot of TODOs left.
It seems a lot slower, but the safety might be worth it.
Admittedly, a much quicker solution would be to retry when you get a checksum error. It's not ideal, but for idempotent operations it might not be terrible.
Alright, this more than proves out the concept.
DSDT patch, not cleaned up (it's missing some buffer length checks on the misaligned read/write check)
This is all alpha-quality, and I should state for the record that while it is fun/interesting I don't know if it's exactly the right thing to do. I took it on as a challenge, after all. :)
It would be much "easier" to revisit Solution 2 by offering a primitive for locking and letting the driver hold it for a little bit; it would also keep the driver relatively clean of system-specific concerns. The original design (which is still present in the diff above) abstracted locking out to optionally include evaluating a DSM. :smile:
This actually works well, for Linux. A user named jubnut on the community forum has been working on a patch to do this.
Unfortunately, the Windows driver model does not offer the ability to just take a named ACPI mutex. ☹️
We were testing out the EC driver, and we think there may be a data consistency issue around access to the ec from your driver and other bios functions. We see if we do a lot of flash access from the OS, that sometimes the data is not what we expect to be in the flash. @JohnAZoidberg
I think that this may be due to ACPI access to the EC may be interleaved with this driver access. Looking at your driver it seems to be doing direct port io.
If you look at our EC interface in ACPI it will acquire a lock around any EC transactions. Some of these ACPI methods will be invoked to get temperature readings every second or so, for various drivers, for example the intel DTT driver will be reading temperature sensors. And we think these may corrupt access from this tool.
See for example EC0 Method M005 framework_dsdt.txt