Open ns-intusurg opened 1 month ago
Hello! Thanks for reaching out with your question. I assume you've found mention of FLR in the documentation here: https://github.com/HFTrader/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md#function-level-reset-flr
Linux platforms exposes access to the FLR with /sys/bus/pci/devices/$BDF/reset
where $BDF
is the bus device function number of the targeted function. To trigger an FLR, you can try the following commands:
echo 1 > /sys/bus/pci/devices/$BDF/reset
OR
echo 1 | sudo tee -a /sys/bus/pci/devices/$BDF/reset
I don't see "reset " listed under the PCI device directory and I'm getting a "No such file or directory" error.
I'm targeting the following device path which is used during the test: /sys/devices/pci0000:00/0000:00:1d.0
Here are the results of "ls -la" under that path:
.. uevent . vendor subsystem -> ../../../bus/pci xdma driver -> ../../../bus/pci/drivers/xdma subsystem_vendor subsystem_device device resource4_wc resource4 resource2_wc resource2 resource1 resource0 revision resource rescan remove power_state power numa_node msi_irqs msi_bus modalias max_link_width max_link_speed local_cpus local_cpulist link irq firmware_node -> ../../LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:ea enable driver_override dma_mask_bits d3cold_allowed current_link_width current_link_speed consistent_dma_mask_bits config class broken_parity_status ari_enabled
That's unexpected, can you share what instance size and AMI you're using? What's the result of lspci -d 1d0f: -vv
?
I'll have to ask IT about the instance size and AMI, as they set everything up and I don't have access to the amazon admin account.
sudo lspci -d 1d0f: -vv
00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) Physical Slot: 3 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
SERR- <PERR- INTx- Latency: 0 Region 0: Memory at 85610000 (32-bit, non-prefetchable) [size=16K] Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed unknown (ok), Width x0 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [b0] MSI-X: Enable+ Count=9 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Kernel driver in use: ena Kernel modules: ena 00:1c.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller (prog-if 02 [NVM Express]) Physical Slot: 28 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 35 NUMA node: 0 Region 0: Memory at 85614000 (64-bit, non-prefetchable) [size=16K] Region 2: Memory at 85620000 (64-bit, prefetchable) [size=8K] Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed unknown (ok), Width x0 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [b0] MSI-X: Enable+ Count=32 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Kernel driver in use: nvme Kernel modules: nvme 00:1d.0 Memory controller: Amazon.com, Inc. Device f000 Subsystem: Device fedd:1d51 Physical Slot: 29 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
SERR- <PERR- INTx- Latency: 0 Region 0: Memory at 82000000 (32-bit, non-prefetchable) [size=32M] Region 1: Memory at 85400000 (32-bit, non-prefetchable) [size=2M] Region 2: Memory at 85600000 (64-bit, prefetchable) [size=64K] Region 4: Memory at 2000000000 (64-bit, prefetchable) [size=128G] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [60] MSI-X: Enable+ Count=33 Masked- Vector table: BAR=2 offset=00008000 PBA: BAR=2 offset=00008fe0 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x16 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Kernel driver in use: xdma Kernel modules: xdma 00:1e.0 Memory controller: Amazon.com, Inc. Device 1041 Subsystem: Xilinx Corporation Device 0007 Physical Slot: 30 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
SERR- <PERR- INTx- Region 0: Memory at 85618000 (64-bit, prefetchable) [size=16K] Region 2: Memory at 8561c000 (64-bit, prefetchable) [size=16K] Region 4: Memory at 85000000 (64-bit, prefetchable) [size=4M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x16 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported
Great, thank you. A uname -a
would also help in place of the full AMI ID (unless the kernel data contains sensitive information).
Waiting for the reply from IT. Here's the command minus the network node hostname:
uname -a
Linux ######## 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Mar 14 14:20:09 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
I'm currently investigating this behavior internally, I hope to have a response by the end of this week. Thank you for your patience!
Just in case you still needed this info about our setup:
instance size - f1.2xlarge ami - RHEL-8.7.0_HVM-20230330-x86_64-56-Hourly2-GP2
Hi,
What is the runtime procedure to issue a PCIe Function Level Reset (sh_cl_flr_assert) to the CL?
I found the HDK's tb.issue_flr() command used for simulation, but I couldn't find any SDK runtime equivalent C function in the repo or in the documentation.
Thanks