genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.06k stars 250 forks source link

register/mmio framework: add upper-bounds checks #4081

Closed nfeske closed 7 months ago

nfeske commented 3 years ago

The Register respectively Mmio API has become predominant in Genode's drivers and the type-safe access to hardware registers has become a second nature. However, one point of unsafety remains. The Mmio constructor merely takes the base address of the locally mapped memory-mapped I/O range as argument, but not an upper bound. All Register definitions are assumed to comply with the size of the MMIO region. I'd argue that this assumption is too optimistic. As a safeguard we should better add a range check against the upper bound whenever a register is accessed.

There are two principle ways of implementing such a check.

The open question is: What should happen whenever a range violation occurs? I think that an error message, followed by a Mmio::Range_violation exception is the way to go. That is, I would not encourage programmers to handle such errors defensively but instead see the violation as an implementation bug (like the one I recently re-introduced but spotted by @alex-ab in https://github.com/genodelabs/genode/issues/2726#issuecomment-820227605). I'd take an error-message + abort over silent memory corruption any day.

While we are at it, how about changing Mmio constructor argument from addr_t to void * such that users of the class don't need to cast the return value of dataspace.local_addr<char>() during construction?

@m-stein Would you be interested in implementing this idea, along with a test?

chelmuth commented 7 months ago
x86_64.nova: [2024-02-04 08:58:22] [init -> ahci_drv] Error: Uncaught exception of type 'Genode::Mmio<40ul>::Range_violation'

@m-stein please have a look into this. can be reproduced by make -C build/x86_64/ run/ahci_block KERNEL=nova in Qemu.

m-stein commented 7 months ago

@chelmuth Sorry, I forgot to mention: I'm already at it.

ssumpf commented 7 months ago

@m-stein: Here _genode-rpi/src/drivers/platform/rpi/rpi_platformdrv is not building, is there a commit already, or shall I look into it?

alex-ab commented 7 months ago

I looking into the x86_32 acpi issue on our test machine

m-stein commented 7 months ago

@ssumpf I just tried make clean; ccache --clear; make drivers/platform/rpi on my re-based merge_to_staging and it works. What's the error?

m-stein commented 7 months ago

Thanks @alex-ab !

ssumpf commented 7 months ago

@ssumpf I just tried make clean; ccache --clear; make drivers/platform/rpi on my re-based merge_to_staging and it works. What's the error?

@m-stein: Thanks for checking, it works now, had to do make clean.

m-stein commented 7 months ago

@ssumpf Nice :)

alex-ab commented 7 months ago

I added a fixup for the 32bit test machine. The former code checked the size and avoided to use registers outside the range. With this fixed template size argument we have to split up the MMIO in two instances. Before using the second mmio range, we have to check explicitly - getting an exception in acpi driver is not a good thing for Sculpt OS boots.

m-stein commented 7 months ago

The AHCI driver scanned ports up to a static maximum instead of the HBA port count, which caused the port offsets to reach the end of the backing MMIO at some point, which led to constructing a Port_base with MMIO range size 0. Commit bf33bddfd2 fixes the test but I'd appreciate someone who knows more about the driver to review it.

chelmuth commented 7 months ago

Thanks, @alex-ab and @m-stein! Merged both commits to staging.

ssumpf commented 7 months ago

@chelmuth: Regarding AHCI see my comment: https://github.com/genodelabs/genode/commit/bf33bddfd2ce85137e8f2b39ce2b219cd2f03548#commitcomment-138278462

ssumpf commented 7 months ago

@chelmuth: af75ece implements the stop "port scan if port count is reached" version

chelmuth commented 7 months ago

Merged.

chelmuth commented 7 months ago

Unfortunately, ahci_drv still fails in x86_64 hardware. Log with additional debug info.

[init -> ahci_drv] number of ports: 6
[init -> ahci_drv] 82: index=0
[init -> ahci_drv] 85: index=0
[init -> ahci_drv] 89: index=0
[init -> ahci_drv]              #0: ATA
[init -> ahci_drv] 82: index=1
[init -> ahci_drv] 85: index=1
[init -> ahci_drv] 82: index=2
[init -> ahci_drv] 85: index=2
[init -> ahci_drv] 82: index=3
[init -> ahci_drv] 85: index=3
[init -> ahci_drv] 82: index=4
[init -> ahci_drv] 85: index=4
[init -> ahci_drv] 82: index=5
[init -> ahci_drv] 85: index=5
[init -> ahci_drv] 82: index=6
[init -> ahci_drv] 85: index=6
[init -> ahci_drv] 82: index=7
[init -> ahci_drv] 85: index=7
[init -> ahci_drv] 82: index=8
[init -> ahci_drv] 85: index=8
[init -> ahci_drv] 82: index=9
[init -> ahci_drv] 85: index=9
[init -> ahci_drv] 82: index=10
[init -> ahci_drv] 85: index=10
[init -> ahci_drv] 82: index=11
[init -> ahci_drv] 85: index=11
[init -> ahci_drv] 82: index=12
[init -> ahci_drv] 85: index=12
[init -> ahci_drv] 82: index=13
[init -> ahci_drv] 85: index=13
[init -> ahci_drv] 82: index=14
[init -> ahci_drv] Error: MMIO range is unexpectedly too small
[init -> ahci_drv] Error: Uncaught exception of type 'Genode::Mmio<40ul>::Range_violation'

Maybe we should actually consider @m-stein's suggestions.

Alternatively, the driver could check whether there's enough HBA space left and break from the loop if not. That would cover hardware that reports a wrong port count.

@ssumpf could you please implement a robust solution?

ssumpf commented 7 months ago

@chelmuth: 8322697 implements port scanning in a correct way by only checking the port implemented bits of the host controller. Only if a port is implemented it's MMIO will be used in order to determine the type of port. If the test still fails then we might have a case of BIOS reporting wrong data or the MMIO range could be wrong. In that case we could still implement @m-stein's approach. But I would like to try without it first.

chelmuth commented 7 months ago

Does the patch address #2587 resp. implement Linux best practice as in https://elixir.bootlin.com/linux/latest/source/drivers/ata/libahci_platform.c#L713 ?

ssumpf commented 7 months ago

Does the patch address #2587 resp. implement Linux best practice as in https://elixir.bootlin.com/linux/latest/source/drivers/ata/libahci_platform.c#L713 ?

Of course, port_implemented checks the pi register. Great that you have found this issue!

chelmuth commented 7 months ago

I successfully tested the patch on hardware and Qemu 4.2. As we now only adhere to PI (not CAP.NP), shouldn't we remove port_count() and log("number of ports: ", _hba.port_count()); for correctness?

ssumpf commented 7 months ago

I successfully tested the patch on hardware and Qemu 4.2. As we now only adhere to PI (not CAP.NP), shouldn't we remove port_count() and log("number of ports: ", _hba.port_count()); for correctness?

Very good. I would like to keep port_count to see if anything looks suspicious because PI is supposed to be filled out by BIOS/UEFI, but I can also remove it.

chelmuth commented 7 months ago

I feel I got your point. Could you add the check you have in mind into the constructor and log a warning in case of suspicious values? In the current form, the log suggests we always check CAP.NP+1 ports, but this is not the case if PI contradicts.

ssumpf commented 7 months ago

@chelmuth: 4b93384 is what I had in mind.

chelmuth commented 7 months ago

@ssumpf I merged bc15203c888b3ff07ae358a562dca63b9900625b which shows the following diagnostic message on macho.

[init -> ahci_drv] port scan:
[init -> ahci_drv]              #0: ATA
[init -> ahci_drv] controller port count differs from detected ports (CAP.NP=0x5,PI=0x1)