genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.08k stars 254 forks source link

ahci_drv: devices not reported #5088

Open cproc opened 10 months ago

cproc commented 10 months ago

When I tried to run Sculpt on a PC with an ASUS Q170T main board, the AHCI driver did not report the connected SATA SSDs, but showed them as off in the log:

[init -> drivers -> dynamic -> ahci_drv] --- Starting AHCI driver ---
[init -> drivers -> dynamic -> ahci_drv] version: major=0x1 minor=0x301
[init -> drivers -> dynamic -> ahci_drv] command slots: 32
[init -> drivers -> dynamic -> ahci_drv] native command queuing: yes
[init -> drivers -> dynamic -> ahci_drv] 64-bit support: yes
[init -> drivers -> dynamic -> ahci_drv] number of ports: 6
[init -> drivers -> dynamic -> ahci_drv]        #0: off (ATA)
[init -> drivers -> dynamic -> ahci_drv]        #1:  off (unknown device signature)
[init -> drivers -> dynamic -> ahci_drv]        #2:  off (unknown device signature)
[init -> drivers -> dynamic -> ahci_drv]        #3: off (ATA)
[init -> drivers -> dynamic -> ahci_drv]        #4:  off (unknown device signature)
[init -> drivers -> dynamic -> ahci_drv]        #5:  off (unknown device signature)

The error symptom in the driver is that setting the Cmd::Fre bit to 0 does not cause the Cmd::Fr bit to change to 0 as well:

https://github.com/genodelabs/genode/blob/c4679e7af644f2ff8d07b9b48bc2c79eb57a79cc/repos/os/src/drivers/ahci/ahci.h#L880-L885

The problem appears to be related to the port reset at the beginning of the Port() constructor:

https://github.com/genodelabs/genode/blob/c4679e7af644f2ff8d07b9b48bc2c79eb57a79cc/repos/os/src/drivers/ahci/ahci.h#L549-L555

When I tried the Cmd::Fre bit change before this reset() call, the Cmd::Fr bit changed to 0 as expected.

The AHCI specification mentions that "software should write all 1s to the PxSERR register to clear any bits that were set as part of the port reset.", which is done after the second port reset in the init() function, but not after the first port reset in the Port() constructor:

https://github.com/genodelabs/genode/blob/c4679e7af644f2ff8d07b9b48bc2c79eb57a79cc/repos/os/src/drivers/ahci/ahci.h#L856-L860

When I added a clear_serr() call in the Port() constructor, the Cmd::Fr bit change was successful, but an IOMMU error message appeared during clear_serr():

[ 0] IOMMU:0xffffffff817ea0b0 FRR:0 FR:0x5 BDF:0:17:0 FI:0x0009d000 (0)

This error message does not occur during the second clear_serr() call in the init() function and apparently got "fixed" by the setup_memory() call before the second port reset.

Removing the first reset() call from the Port() constructor appears to solve the problem without an IOMMU error, but I'm not sure if it is the correct fix.

chelmuth commented 8 months ago

Have the changes in #5101 positive impact on this issue?

cproc commented 8 months ago

Have the changes in #5101 positive impact on this issue?

No, the problem still exists. The problematic first port reset has now moved to Port::reinit().

chelmuth commented 1 week ago

This discussion in the user forum hints we should address this issue...