Ed-Yang / xilinx-ethash

Run ethash opencl kernel on Xilinx's Alveo U50
17 stars 12 forks source link

bad_alloc error when set epoch to 417 #3

Open lukey936a opened 3 years ago

lukey936a commented 3 years ago

HI,when I set the epoch to 417 , I got a lot of errors, it seems when dag size bigger than 4GB, bad_alloc error happens ,do you know how to fix this? ./build/xleth 417 Xilinx ./xleth/xclbin/ethash.hw.xclbin ....... Trying to program device xilinx_u50_gen3x16_xdma_201920_3 INFO: Reading ./xleth/xclbin/ethash.hw.xclbin Loading: './xleth/xclbin/ethash.hw.xclbin' Device program successful. DEV: jinlili Global mem size 0 GB KNL: L_WORKSIZE 128 KNL: MULTIPLIER 65536 KNL: G_WORKSIZE 16384 KNL: FASTEXIT 0 DEV: Global mem size 0 GB DEV: Max alloc size 4096 MB DEV: Max W Group size 4294967295 DEV: Max W Item size 4294967295/4294967295/4294967295 DEV: Max compute unit 2

Generating DAG ...

DAG: generating for epoch 417 ... XRT build version: 2.9.317 Build hash: b0230e59e22351fb957dc46a6e68d7560e5f630c Build date: 2021-03-13 05:10:45 Git branch: 2020.2_PU1 PID: 1940 UID: 1000 [Mon May 31 04:29:43 2021 GMT] HOST: jin-HP-Z220-SFF-Workstation EXE: /home/jin/work/ethash/xilinx-ethash/build/xleth [XRT] ERROR: std::bad_alloc [XRT] ERROR: std::bad_alloc DAG: epoch 417 lightSize 71432512 dagSize 4571790208 [XRT] ERROR: Kernel arg '_DAG0' is not set DAG: item 0 chunk 1280000, took 0.00s [XRT] ERROR: Kernel arg '_DAG0' is not set DAG: item 1280000 chunk 1280000, took 0.00s ......

Ed-Yang commented 3 years ago

I am totally fresh on Xilinx solution and I did not have U50 on hand, maybe you could try to extend the DAG size in:

https://github.com/Ed-Yang/xilinx-ethash/blob/main/xleth/config/connectivity_u50.ini

RezaAhmadi0117 commented 3 years ago

u can use 2 way for doing this: 1 - pass data to host when program wanna create context (no HBM) OR 2 - using more HBM bank(increase range number in config with (as Ed said) each bank has 2Gb space and totally 32 bank is available . so u need to add 20 bank of HBM to m_dag port( I test it on epoch 430 and its work for me)

lukey936a commented 3 years ago

u can use 2 way for doing this: 1 - pass data to host when program wanna create context (no HBM) OR 2 - using more HBM bank(increase range number in config with (as Ed said) each bank has 2Gb space and totally 32 bank is available . so u need to add 20 bank of HBM to m_dag port( I test it on epoch 430 and its work for me)

This problem have solved by larger the HBM BANKS FOR dag IN the ini file, but I was confued why run on HW is slower than on SW_EMU

RezaAhmadi0117 commented 3 years ago

In SW_EMU you are suing CPU to process DAG creation. but in real hardware You implement hardware with HLS code. PL parts in FPGA are slower than CPU (in freq.) and also need to optimize with Xilinx attribute (OR pragma). you and search and see about this problem in HLS tools that many researchers now working on that. for be faster than CPU you need optimize this part but another way is to create dataset on CPU. I post in another issue a file u can test it.(it is not complete because of copy buffer, but DAG creation is ok)

lukey936a commented 3 years ago

In SW_EMU you are suing CPU to process DAG creation. but in real hardware You implement hardware with HLS code. PL parts in FPGA are slower than CPU (in freq.) and also need to optimize with Xilinx attribute (OR pragma). you and search and see about this problem in HLS tools that many researchers now working on that. for be faster than CPU you need optimize this part but another way is to create dataset on CPU. I post in another issue a file u can test it.(it is not complete because of copy buffer, but DAG creation is ok)

Hi,jackwatson01234 I downloaded you file and run on my hardware, but get errors: ./build/xleth 0 4 ./xleth/kernel/ethash.cl ./xleth/xclbin/ethash.hw.xclbin ....... Found Platform Platform Name: Xilinx platform intel not found, kernel is not loaded

I think the dag generation is no the major problem,dag file can be generated by host cpu use 'geth makedag blockheight' cmd,and migrate to global memory of U50.

RezaAhmadi0117 commented 3 years ago

Do you install intel ocl deriver? your intel driver not founded and you need to install that. check below link: https://software.intel.com/content/www/us/en/develop/articles/opencl-drivers.html#cpu-section and yes there no different on that. If u have this device(u50 and u280), pls send me massage on discord( J_Watson#4036). I'm working on importing Ethminer to u50 but I've not that device.(I've one with a lot of problems :) )