OFS / opae-sdk

Open Programmable Acceleration Engine
https://ofs.github.io
BSD 3-Clause "New" or "Revised" License
251 stars 84 forks source link

[Fix] - Check memory calibration status before running host_exerciser #3094

Closed anandhv closed 7 months ago

anandhv commented 7 months ago

Description

This change addresses this bug/case.

The bug is that the host_exerciser utility (which pushes traffic between the host and FPGA-attached DDR) would fail non-gracefully if the DDR calibration had failed. Fortunately the calibration status is reported by the dfl-emif driver via sysfs entries (one per mem channel). This change simply errors-out gracefully, with a useful error message, after reading the sysfs entries.

The host_exerciser utility is built on top of the afu_test framework. The corresponding AFU is exposed as a PCIe Virtual Function (VF). As described in the docs, the user must first bind the VF endpoint to the vfio-pci driver. This seems to also result in OPAE using the opae-v (VFIO-specific) plugin rather than the xfpga plugin. The opae-v plugin only exposes a subset of the OPAE-API and notably omits functions related to sysobjects (which are used to access sysfs entries). Therefore the afu_test framework has also been updated to grab a handle to the FPGA_DEVICE (i.e. the FIM) so that we can access sysobjects.

I've noted in the code comments that the change is a bit kludgy, owing to the fact that we don't know how many mem channels are present on the given board and therefore don't know how many sysfs entries we need to poll. While xfpga's sysobject implementation supports wildcarded searches for sysfs entries (via glob) and can return arrays of such objects, it oddly, specifically, disables this behaviour if the glob string uses a recursive wildcard ("/**/"). I think such a wildcard is necessary since we shouldn't assume a fixed sysfs folder hierarchy. So future work is to remove this, likely, unnecessary restriction in the sysobject implementation. I just don't have time for this given the release deadline is next week.

Collateral (docs, reports, design examples, case IDs):

Tests added:

Tests run: