Microsemi / switchtec-kernel

A kernel module for the Microsemi PCIe switch
GNU General Public License v2.0
45 stars 31 forks source link

Need support for Linux AER driver's recovery sequence for fatal AERs #74

Open dinderieden opened 5 years ago

dinderieden commented 5 years ago

Linux AER driver (drivers/pci/pcie/aer) implements handling for AERs downstream of Root port, where AER Uncorrectable Fatal/NonFatal errors are processed in "do_recovery" method. In particular for AER Fatal errors the recovery sequence will do the following steps: 1) Call "error_detected" err_handler entry point for all downstream devices, which should put devices into a state to handle a link reset of upstream link. 2) Call "reset_link" method to reset the upstream port (generally a secondary bus reset is issued). 3) Call "mmio_enabled" err_handler entry point for all downstream devices as part of re-enabling device after link reset. 3) Call "slot_reset" err_handler entry point for all downstream devices as part of re-enabling device after link reset.. 4) Call "resume" err_handler entry point for all downstream devices as port of re-enabling device after link reset.

The switchtec driver code does not currently contain these AER err_handling entry points for "error_detected", "mmio_enabled", "slot_reset", and "resume". Thus, if AER Fatal error originates from switchtec upstream or management device, then AER driver encounters error status of "PCI_ERS_RESULT_NO_AER_DRIVER" from switchtec-kernel driver. This error status in turn causes the AER driver recovery sequence to quit, leaving other downstream devices from the PCI switch in a disabled state.

Note, I have some code changes that I can present in a pull request that provides a suggested implementation to satisfy the necessary entry points for AER driver recovery sequence.