FPGA-Research-Manchester / fos

FOS - FPGA Operating System
Other
62 stars 10 forks source link

Trouble porting the project to PYNQ Z2 #8

Open alexisduhamel opened 3 years ago

alexisduhamel commented 3 years ago

Hello,

I've started working again with FOS and a PYNQ Z2 board, but I haven't managed to successfuly make my own accelerators communicate with the CPU via Ponq so far.

The tools I used are: Vivado 2018.2 +SDK, Current version of FOS + Pynq v2.3 (v2.3 being the version used in your compilation guide, proof here, line 325)

As I mentionned in Issue #7, I first had some trouble working with the library and the PCAP manager. Eventually thanks to khoapham I've been able to seemingly make it work as I was able to write to the PCAP port via CLI, and use PONQ's acc.load(accel_name) function to load it. I wasn't able to properly check my IP however, as FOS is based on PYNQ v2.3 and there seem to be an odd behaviour with DMA compared with v2.6 which I'm more used to.

Since I'm more experienced with VHDL than HLS, I wrote my own AXI IP to instanciate with the PS which has been verified with PYNQ v2.6 Overlay library. The IP has a slave and master AXIS interface and is connected to the PS via DMA, since as introduced in your Vivado tutorial you are using Ultrascale boards which has both a Slave and Master HP AXI ports, whereas the non-Ultrascale only has a Slave HP AXI port, and Master GP AXI ports. Is this an issue?

I then used the DMA IP to communicate with my IP, the registers as described within SDK (hardware description file) to write my accelerator are the following, written in my accelerator json file:

{
  "name": "static_full_add",
  "address": "0x40400000",
  "bitfiles": [
    {"name": "static_full_vadd.bit", "region": "full"}
  ],
  "registers":[
    {"name":"MM2S_DMACR", "offset":"0x0"},
    {"name":"MM2S_DMASR", "offset":"0x4"},
    {"name":"MM2S_CURDESC", "offset":"0x8"},
    {"name":"MM2S_CURDESC_MSB", "offset":"0xC"},
    {"name":"MM2S_TAILDESC", "offset":"0x10"},
    {"name":"MM2S_TAILDESC_MSB", "offset":"0x14"},
    {"name":"SG_CTL", "offset":"0x2C"},
    {"name":"S2MM_DMACR", "offset":"0x30"},
    {"name":"S2MM_DMASR", "offset":"0x34"},
    {"name":"S2MM_CURDESC", "offset":"0x38"},
    {"name":"S2MM_CURDESC_MSB", "offset":"0x3C"},
    {"name":"S2MM_TAILDESC", "offset":"0x40"},
    {"name":"S2MM_TAILDESC_MSB", "offset":"0x44"}
  ]
}

This has caused me lots of trouble figuring how to properly initiate communication with the accelerator without making the jupyter notebook not responding, and as I was basically starting to writing a whole DMA driver in python I figured I was doing things wrong, since your examples are much more straight forward.

I then backtracked and started over. Below is my workflow for a static design using a Vivado AXIS Fifo IP, connected to the Slave HP port with an AXI-S to Memory Mapped adaptator. I carefully mapped everything on 32 bits bus width as the readme mention this might be an issue. My workflow is based on the tutorial about creating hls ips and usage of static design. barebone_axi_fifo address_editor

I generated the bin file manually within the SDK by running bootgen -image output.bif -arch zynq -process_bitstream bin, after having created the output.bif file as the tutorial.

Finally, after having reset the board, I uploaded the bit file, bin file, and the following json files to configure the shell and accelerator:

repo.json

{
  "shell": "PYNQ_Z2",
  "accelerators": [
    "design_1_wrapper"
  ]
}

PYNQ_Z2.json

{
  "name": "PYNQ_Z2",
  "bitfile": "design_1_wrapper.bin",
  "regions": [
    {"name": "full", "blank": "design_1_wrapper.bin", "bridge": "0x00000000", "addr": "0x60010000"}
  ]
}

design_1_wrapper.json

  "name": "design_1_wrapper",
  "address": "0x60010000",
  "bitfiles": [
    {"name": "design_1_wrapper.bin", "region": "full"}
  ],
  "registers":[
    {"name":"data", "offset":"0x0"},
    {"name":"control", "offset":"0x4"}
  ]

Register map is that of the axi memory map to stream. According to the Vivado Address Editor, I should expect output data in the DDR memory space.

At this point I was able to load the bin files: output

But once I try to write in the registers, the kernel dies and restarts:

acc.writeReg("data", 0x00010001) # Data to store
acc.writeReg("control", 0xFFFFFFFF) # WSTRB = FF, TLAST need to be set to 1, other bytes can be left as 1 

I'm sure there is something that I'm missing here, or maybe non-ultrascale boards aren't compatible as all your tests seem to be conducted on ultrascale boards. Could you please tell me what am I doing wrong?

Kind regards, Alexis