aws / aws-fpga

Official repository of the AWS EC2 FPGA Hardware and Software Development Kit
Other
1.51k stars 514 forks source link

EC2 F1: AWS FPGA DRAM DMA Example fails during execution #584

Closed jasmisbvb closed 1 year ago

jasmisbvb commented 1 year ago

We are currently evaluating AWS's EC2 F1 instances as solution for a multitude of FPGA projects. As part of that we are running the example code provided by AWS in https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples . Observed behaviour:

Expected behaviour:

Performed steps for each of the examples:

  1. Setup the HDK on an EC2 instance with AWS FPGA Developer AMI. Then build the FPGA CL of the example by following the steps described on https://github.com/aws/aws-fpga/blob/master/hdk/README.md Result of step: AFI is indicated as available in AFI storage.
  2. Setup the SDK on an EC2 F1 instance with AWS Linux Base AMI according to https://github.com/aws/aws-fpga/blob/master/sdk/README.md also tried using latest FPGA Developer AMI (Amazon Linux 2)
  3. Build the XDMA driver and load it into the kernel according to https://github.com/aws/aws-fpga/blob/master/sdk/linux_kernel_drivers/xdma/xdma_install.md
  4. Load the AFI as described on the hdk/sdk READMEs.
  5. Build the C code host application part of the example and then execute it as described on the same READMEs.
  6. The observed behaviour occurs.

It is important for us that the DRAM DMA example works correctly, as performant memory access through XDMA is essential for our FPGA projects.

Please support here, is it possible that the DRAM DMA example or the XDMA driver are faulty?

kyyalama2 commented 1 year ago

Dear customer, Thanks for your interest in AWS F1. Just to make sure, can you confirm if you see Busmaster enabled when you do lspci. Thanks

rokil commented 1 year ago

I am a colleague of @jasmisbvb. Here is the output, looks like busmaster is missing?

# lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 00:1c.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 00:1d.0 Memory controller: Amazon.com, Inc. Device 1042 00:1e.0 Memory controller: Amazon.com, Inc. Device 1041 00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) lspci gives identical output on both F1 instances with:

kyyalama2 commented 1 year ago

Dear customer,

can you please post the output of lspci -vv for the 00:1d.0 Memory controller: Amazon.com, Inc. Device 1042 and 00:1e.0 Memory controller: Amazon.com, Inc. Device 1041. This should show whether the Busmaster is enabled or not

Thanks

rokil commented 1 year ago

For 00:1d.0 Memory controller: Amazon.com, Inc. Device 1042 it says BusMaster+ For 00:1e.0 Memory controller: Amazon.com, Inc. Device 1041 it says BusMaster-

`00:1d.0 Memory controller: Amazon.com, Inc. Device 1042` ` Subsystem: Xilinx Corporation Device 0007` ` Physical Slot: 29` ` Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-` ` Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR-

@kyyalama2

kyyalama2 commented 1 year ago

Dear customer,

can you please try out the following:

1.) enable busmaster and mem space on both the PFs using setpci -s 0x4.l=0x00000006 2.) Also do you see this failure on other instances too? If you havent can try launching another instance and see if you can load the same AFI and run the test successfully or if it still fails? 3.) In addition, just to make sure, did you make any changes to the RTL/cl example design? Did it pass timing? 4.) Do you see any hangs or protocol errors in the shell metrics ref: https://github.com/aws/aws-fpga/blob/master/hdk/docs/HOWTO_detect_shell_timeout.md#how-to-detect-a-shell-timeout-has-occurred

Thanks

kyyalama2 commented 1 year ago

Dear customer,

Can you please confirm if you are still seeing this issue or the recommendations resolved the errors you were seeing?

Thanks

kyyalama2 commented 1 year ago

closing this ticket, since not sure if still require support on this. Please feel free to comment if need help on this. Thanks