Xilinx / DPU-PYNQ

DPU on PYNQ
Apache License 2.0
197 stars 67 forks source link

One DPU on ZCU104 not working #104

Open jjsuperpower opened 1 year ago

jjsuperpower commented 1 year ago

I am having an issue modifying the ZCU104 example to have only one DPU core instead of two. I am using Vivado and Vitis 2022.2. And I am using XRT build version 2.12.0. For testing, I have been using the code from dpu_mnist_classifier.ipynb script along with the provided xmodel. I am able to successfully compile and run the DPU build example, however when I modify the prj_config file to reduce the DPU cores from two to one I am experiencing an error at runtime. I have been struggling with this error for a few weeks, so any help or suggestions will be much appreciated.

Modified prj_config

[clock]

id=1:DPUCZDX8G_1.aclk
id=6:DPUCZDX8G_1.ap_clk_2

[connectivity]

sp=DPUCZDX8G_1.M_AXI_GP0:HP0
sp=DPUCZDX8G_1.M_AXI_HP0:HP1
sp=DPUCZDX8G_1.M_AXI_HP2:HP2

nk=DPUCZDX8G:1

[advanced]
misc=:solution_name=link

#param=compiler.addOutputTypes=sd_card
#param=compiler.skipTimingCheckAndFrequencyScaling=1

[vivado]
prop=run.impl_1.strategy=Performance_ExploreWithRemap
#prop=run.impl_1.strategy=Congestion_SpreadLogic_low
#prop=run.impl_1.strategy=Performance_Explore

#param=place.runPartPlacer=0

Runtime Error

xilinx@pynq:~$ [ 9169.417464] Internal error: synchronous external abort: 96000010 [#1] SMP
[ 9169.424248] Modules linked in: zocl(O) uio_pdrv_genirq
[ 9169.429386] CPU: 3 PID: 1683 Comm: python3 Tainted: G           O      5.15.19-xilinx-v2022.1 #1
[ 9169.438159] Hardware name: ZynqMP ZCU104 RevC (DT)
[ 9169.442934] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 9169.449886] pc : xrt_cu_hls_init+0xc4/0x120 [zocl]
[ 9169.454695] lr : xrt_cu_hls_init+0x64/0x120 [zocl]
[ 9169.459494] sp : ffff80000a1d3600
[ 9169.462793] x29: ffff80000a1d3600 x28: ffff000007610080 x27: ffff000004b95010
[ 9169.469919] x26: ffff000004b95010 x25: ffff8000092ec318 x24: ffff00002ec74000
[ 9169.477045] x23: ffff0000199d3800 x22: ffff000007610340 x21: ffff8000099c9000
[ 9169.484172] x20: ffff00002ec74000 x19: ffff000009658300 x18: ffffffffffffbf20
[ 9169.491298] x17: 75612e322e55432f x16: ffff8000099ff000 x15: ffff0000029f8190
[ 9169.498424] x14: 0000000000000000 x13: ffff8000099f8000 x12: ffff80000940cba8
[ 9169.505550] x11: ffff80000940cba8 x10: ffff800009698558 x9 : 0000800076646000
[ 9169.512677] x8 : ffff8000099cb000 x7 : ffff000001971e58 x6 : 0000000000000000
[ 9169.519803] x5 : 00000002ffffffff x4 : 0000000000000005 x3 : 0000000000000000
[ 9169.526929] x2 : ffff800000f87000 x1 : ffff8000099cb000 x0 : 0000000000000000
[ 9169.534056] Call trace:
[ 9169.536487]  xrt_cu_hls_init+0xc4/0x120 [zocl]
[ 9169.540939]  cu_probe+0x190/0x27c [zocl]
[ 9169.544862]  platform_probe+0x68/0xe0
[ 9169.548517]  really_probe.part.0+0x9c/0x30c
[ 9169.552692]  __driver_probe_device+0x98/0x144
[ 9169.557040]  driver_probe_device+0x44/0x11c
[ 9169.561215]  __device_attach_driver+0xb4/0x120
[ 9169.565651]  bus_for_each_drv+0x78/0xd0
[ 9169.569479]  __device_attach+0xdc/0x184
[ 9169.573306]  device_initial_probe+0x14/0x20
[ 9169.577482]  bus_probe_device+0x9c/0xa4
[ 9169.581310]  device_add+0x36c/0x860
[ 9169.584790]  platform_device_add+0x114/0x234
[ 9169.589052]  subdev_create_cu+0xec/0x1b0 [zocl]
[ 9169.593592]  zocl_xclbin_read_axlf+0x8a8/0xc0c [zocl]
[ 9169.598644]  zocl_read_axlf_ioctl+0x18/0x24 [zocl]
[ 9169.603435]  drm_ioctl_kernel+0xc4/0x11c
[ 9169.607350]  drm_ioctl+0x214/0x44c
[ 9169.610743]  __arm64_sys_ioctl+0xb8/0xe0
[ 9169.614658]  invoke_syscall+0x54/0x124
[ 9169.618399]  el0_svc_common.constprop.0+0x44/0xfc
[ 9169.623095]  do_el0_svc+0x48/0xb0
[ 9169.626402]  el0_svc+0x28/0x80
[ 9169.629449]  el0t_64_sync_handler+0xa4/0x130
[ 9169.633711]  el0t_64_sync+0x1a0/0x1a4
[ 9169.637370] Code: b9022a84 f9015282 7100087f 54000120 (b9400021) 
[ 9169.643453] ---[ end trace 53c1473b5c2422a2 ]---
[ 9169.703179] zocl-drm axi:zyxclmm_drm: zocl_destroy_client: client exits pid(1683)
[ 9169.711056] zocl-drm axi:zyxclmm_drm: zocl_destroy_client: client exits pid(1683)
skalade commented 1 year ago

Hi @jjsuperpower,

I gave it a shot on my side, modified the prj_config in the boards/zcu104 folder as follows:

[clock]

freqHz=300000000:DPUCZDX8G_1.aclk
freqHz=600000000:DPUCZDX8G_1.ap_clk_2
#freqHz=300000000:DPUCZDX8G_2.aclk
#freqHz=600000000:DPUCZDX8G_2.ap_clk_2

[connectivity]

sp=DPUCZDX8G_1.M_AXI_GP0:HPC0
sp=DPUCZDX8G_1.M_AXI_HP0:HP0
sp=DPUCZDX8G_1.M_AXI_HP2:HP1
#sp=DPUCZDX8G_2.M_AXI_GP0:HPC0
#sp=DPUCZDX8G_2.M_AXI_HP0:HP2
#sp=DPUCZDX8G_2.M_AXI_HP2:HP3

nk=DPUCZDX8G:1

[advanced]
misc=:solution_name=link

#param=compiler.addOutputTypes=sd_card
#param=compiler.skipTimingCheckAndFrequencyScaling=1

[vivado]
prop=run.impl_1.strategy=Performance_Explore
#param=place.runPartPlacer=0

Which looks very much like your config. Replacing the dpu.bit, .hwh, .xclbin files in /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq_dpu with the new single core versions, I was able to run the mnist example notebook without issue.

I'm using Vitis 2022.1 however. Maybe switching Vitis versions is an option? I believe 2022.1 is better supported for Vitis AI 2.5.

jjsuperpower commented 1 year ago

Thank you for looking into it. I will go ahead and try out your suggestion and see if downgrading Vitis makes a difference.

jjsuperpower commented 1 year ago

I ran the DPU build with Vitis 2022.1 and unfortunately I still have the same error as before. Do you have any other ideas why I am having this error? I'm not sure if this is helpful, but here is the dpu.xclbin.info file that was generated from the build. Also, here is the block design generated by Vitis (located in DPU-PYNQ/boards/zcu104/binary_container_1/link/vivado/vpl/prj).

bd

skalade commented 9 months ago

Hi @jjsuperpower, did you find a solution? If so, please feel free to share what it was and/or close out this issue. Thanks.

jjsuperpower commented 9 months ago

Yes, I was able to solve the problem. Apologies for forgetting to reply. The issue I was having was my .xclbin file was not being copied to /usr/lib/dpu.xclbin.

Here is why I missed this small but very important detail. The download method of DPUOverlay calls the copy_xclbin method to copy a custom_overlay.xclbin to /usr/lib/dpu.xclbin. My assumption was the download method is designed to be called when swapping out overlays (or reloading an overlay), but was not needed when loading a new overlay. So what I was doing wrong is initializing DPUOverlay class and then calling the load_model method. This meant the .xclbin file was never being copied and VART/XRTs drivers were not able to properly communicate with the hardware causing the kernel to crash.

Is there a reason the __init__ method of DPUOverlay does not call copy_xclbin? Would it cause problems to add it? To me, the current implementation seems a bit unintuitive as the __init__ method will load the bitstream but not copy the .xclbin file. At the very least could documentation be added to prevent someone else from having the same issue?

Thanks, jj

skalade commented 9 months ago

@jjsuperpower, thanks a bunch that's really useful to know. I made a PR that should address this issue in the future https://github.com/Xilinx/DPU-PYNQ/pull/111

Long story short - I believe that your xclbin was getting downloaded to /usr/lib, however it wasn't replacing dpu.xclbin, it was saved as /usr/lib/custom_overlay.xclbin. The way VART knows where to look for this firmware is via the /etc/vart.conf file, which is hardcoded to dpu.xclbin (or whatever default the Vitis AI petalinux flow leaves it at). The change in the PR will overwrite that config file as part of the xclbin download.