Linux AER driver (drivers/pci/pcie/aer) implements handling for AERs downstream of Root port, where AER Uncorrectable Fatal/NonFatal errors are processed in "do_recovery" method. In particular for AER Fatal errors the recovery sequence will do the following steps:
1) Call "error_detected" err_handler entry point for all downstream devices, which should put devices into a state to handle a link reset of upstream link.
2) Call "reset_link" method to reset the upstream port (generally a secondary bus reset is issued).
3) Call "mmio_enabled" err_handler entry point for all downstream devices as part of re-enabling device after link reset.
3) Call "slot_reset" err_handler entry point for all downstream devices as part of re-enabling device after link reset..
4) Call "resume" err_handler entry point for all downstream devices as port of re-enabling device after link reset.
The switchtec driver code does not currently contain these AER err_handling entry points for "error_detected", "mmio_enabled", "slot_reset", and "resume". Thus, if AER Fatal error originates from switchtec upstream or management device, then AER driver encounters error status of "PCI_ERS_RESULT_NO_AER_DRIVER" from switchtec-kernel driver. This error status in turn causes the AER driver recovery sequence to quit, leaving other downstream devices from the PCI switch in a disabled state.
Note, I have some code changes that I can present in a pull request that provides a suggested implementation to satisfy the necessary entry points for AER driver recovery sequence.
Linux AER driver (drivers/pci/pcie/aer) implements handling for AERs downstream of Root port, where AER Uncorrectable Fatal/NonFatal errors are processed in "do_recovery" method. In particular for AER Fatal errors the recovery sequence will do the following steps: 1) Call "error_detected" err_handler entry point for all downstream devices, which should put devices into a state to handle a link reset of upstream link. 2) Call "reset_link" method to reset the upstream port (generally a secondary bus reset is issued). 3) Call "mmio_enabled" err_handler entry point for all downstream devices as part of re-enabling device after link reset. 3) Call "slot_reset" err_handler entry point for all downstream devices as part of re-enabling device after link reset.. 4) Call "resume" err_handler entry point for all downstream devices as port of re-enabling device after link reset.
The switchtec driver code does not currently contain these AER err_handling entry points for "error_detected", "mmio_enabled", "slot_reset", and "resume". Thus, if AER Fatal error originates from switchtec upstream or management device, then AER driver encounters error status of "PCI_ERS_RESULT_NO_AER_DRIVER" from switchtec-kernel driver. This error status in turn causes the AER driver recovery sequence to quit, leaving other downstream devices from the PCI switch in a disabled state.
Note, I have some code changes that I can present in a pull request that provides a suggested implementation to satisfy the necessary entry points for AER driver recovery sequence.