UCLA-VAST / minimap2-acceleration

Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: Open Source Repository
http://vast.cs.ucla.edu/sites/default/files/publications/minimap2-acc-approved.pdf
MIT License
31 stars 12 forks source link

Reproducing FPGA work #6

Closed kisarur closed 3 years ago

kisarur commented 3 years ago

Hello,

I have been trying to reproduce the FPGA implementation of your work on an Amazon AWS EC2 instance. Unfortunately, I was not able to successfully reproduce it. I believe this is due to some issue/s in the environment configuration.

The steps I followed are mentioned below.

  1. Created an Amazon AWS EC2 instance with "FPGA Developer AMI v1.6.1" (since v1.6.1 has Xilinx 2018.3 tool set)
  2. Downloaded "AWS EC2 FPGA Development Kit v1.4.15a" (since v1.4.15a is the latest version that supports Xilinx 2018.3 tool set) and setup it according to the instructions available at https://github.com/aws/aws-fpga/tree/v1.4.15a/SDAccel
  3. Downloaded your source code and compiled it with "cd kernel/hls/ && make csim-target"
  4. Since there was an error with compiling (related to initializing variable-sized arrays), I upgraded gcc/g++ versions from 4.8.5 to 7.3.1

With new gcc/g++ versions, the error in the last step above could be eliminated, but the compilation again stopped with the following error.

ERROR: [XOCC 60-1258] No valid platform was found that matches 'xilinx_vcu1525_xdma_201830_1'. Please make sure that the platform is specified correctly, and the platform has the right version number. The platform repo paths are:
The valid platforms found from the above repo paths are:

ERROR: [XOCC 60-587] Failed to add a platform: specified platform xilinx_vcu1525_xdma_201830_1 is not found or is not valid
ERROR: [XOCC 60-600] Kernel compile setup failed to complete
ERROR: [XOCC 60-592] Failed to finish compilation

However, in order to test whether the environment has been properly setup, I ran an SDAccel example (available in $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl directory) and it could be successfully compiled and run with the following commands.

$ cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/          
$ make clean                                                                 
$ make check TARGETS=sw_emu DEVICES=$AWS_PLATFORM all

When I examined the Makefile of the example above, I could see the target device is set with DEVICES environment variable set at the time of compilation. I tried setting XDEVICE used in your Makefile to the same $AWS_PLATFORM, but it again gave a similar error saying the platform is not correctly set.

Could you please check and let me know if there's any issue in the steps I've followed / the tool versions I'm using or if I have missed any steps.

Thank you!

Best, Kisaru

dotkrnl commented 3 years ago

Hi Kisaru,

Thank you so much for your interest in our work!

You are in the correct direction. For execution on AWS, you would want to set XDEVICE to $AWS_PLATFORM. At the same time, please make sure that you have installed the AWS_PLATFORM by:

    $ git clone https://github.com/aws/aws-fpga.git $AWS_FPGA_REPO_DIR  
    $ cd $AWS_FPGA_REPO_DIR                                         
    $ source sdaccel_setup.sh

Please make sure that $AWS_PLATFORM is set after the setup. Please set XDEVICE to $AWS_PLATFORM by running commands like XDEVICE=$AWS_PLATFORM make csim-target. If this does not solve your question, could you please post the "similar error saying the platform is not correctly set"?

Please let me know if you have any other questions in setting up the environment or running our kernel. We hope this work would be useful for your research :-).

Thanks, Jason

kisarur commented 3 years ago

Hi Jason,

Thank you very much for your quick reply!

I have already installed and setup the AWS_PLATFORM like you have mentioned. After the setup, AWS_PLATFORM is set to the following.

/home/centos/src/project_data/aws-fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1-04261818_dynamic_5_0/xilinx_aws-vu9p-f1-04261818_dynamic_5_0.xpfm

The error I'm getting when I compile your code giving this as an argument (command used: make XDEVICE=$AWS_PLATFORM csim-target) is this.

ERROR: [XOCC 60-1258] No valid platform was found that matches '/home/centos/src/project_data/aws-fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1-04261818_dynamic_5_0/xilinx_aws-vu9p-f1-04261818_dynamic_5_0_xpfm'. Please make sure that the platform is specified correctly, and the platform has the right version number. The platform repo paths are:
The valid platforms found from the above repo paths are:

ERROR: [XOCC 60-587] Failed to add a platform: specified platform /home/centos/src/project_data/aws-fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1-04261818_dynamic_5_0/xilinx_aws-vu9p-f1-04261818_dynamic_5_0_xpfm is not found or is not valid
ERROR: [XOCC 60-600] Kernel compile setup failed to complete
ERROR: [XOCC 60-592] Failed to finish compilation

Thank you very much again for helping me configure this!

Best, Kisaru

dotkrnl commented 3 years ago

Thank you so much for providing this information. There is a compatibility issue in our Makefile that will cause it to fail when the platform is specified using a path. I have updated the Makefile. Please git pull and make XDEVICE=$AWS_PLATFORM csim-target again to see if the issue is fixed.

kisarur commented 3 years ago

Hi Jason,

Thank you very much for the updated Makefile! With this updated Makefile, I could successfully compile and run it for software simulation (make csim-target). I'm sorry I couldn't reply to you earlier since it took some time for me to put everything together and compile it on AWS F1 instance targeting onboard execution (make bitstream).

I came across a few issues during the compilation process for onboard execution and could luckily resolve them myself :). Some of these issues might be due to the fact that I'm trying to compile it on a fresh environment with only default configurations. I'm posting the issues and how I solved them here so that it'd be beneficial for anyone who tries to recreate your great work in future.

First, I had to set AWS_BUCKETenvironment variable pointing to an AWS S3 bucket I created myself to get the compilation for hardware execution working successfully. Next, I had to change the Makefile so that the path to create_sdaccel_afi.sh is correctly set (had to change create-sdaccel-afi on Makefile to $(SDACCEL_DIR)/tools/create_sdaccel_afi.sh). Lastly, for onboard execution, instead of "kernel.xclbin" (the one given in the sample command on README), I had to use "kernel.awsxclbin".

After the successful compilation, I ran it on FPGA available on AWS F1 instance with first 30000 reads of c_elegans40x dataset (dumped using your testbed with --chain-dump-limit=30000). I would like to know if there's any limitation on the number of reads / size of reads that can be sent to the FPGA as I'm hoping to try to accelerate processing a human genome with your work. Related to this, I'd like to know whether you did the timing/performance analysis on your paper (https://ieeexplore.ieee.org/abstract/document/8735515) using the full c_elegans40x dataset (which is about 8.9 GB in FASTQ format) or with a subset of that.

Thank you very much for the continued support!

Best, Kisaru

dotkrnl commented 3 years ago

Hi Kisaru,

Thank you so much for the information. It would benefit others a lot.

The hardware is fully pipelined and you can easily convert it into a streaming design, with which you can feed unlimited input continuously. There is no limitation on the size of the input on the hardware side. However, there is indeed a limitation on the testbench which we used for the experiments in the paper. And unfortunately, we haven't implemented any streaming software driver or any integration of this acceleration. Therefore, using the proof-of-concept testbench, you can only run the experiments of a limited size of data, which depends on the memory available on your host and your device.

For the experiments in the paper, we use a subset, but most of the full c_elegans40x dataset. Therefore, the time shown in the paper would be roughly the full execution time. We perform the experiments by invoking the testbench for different parts of the computation.

For your information, for real-world application, you would need to implement a new software driver to avoid the bottleneck introduced by the proof-of-concept testbench, which isn't optimized at all and has unsatisfactory performance. If you are only experimenting with the hardware instead of the end-to-end flow, you could measure the device time alone.

Thanks, Jason

kisarur commented 3 years ago

I see, thanks for the information and letting me know the limitations! I will test it further for my dataset and check the performance gains compared to software run. I'll look for future directions based on the results.

Thanks again for all the support!