apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.58k stars 3.43k forks source link

[RFC][VTA] Support for Cloud Devices (OpenCL-compatible) #5840

Closed zhanghaohit closed 2 years ago

zhanghaohit commented 4 years ago

Motivation

Cloud devices are more powerful than Edge devices, which provides higher computation capabilities for deep learning workloads. For example, for the VTA core, with Cloud devices, we have more resources to support larger GEMM cores (e.g., 32*32 or even 64*64) and device buffers, thus making it possible to boost the performance to great extent. Therefore, it is worthwhile to provide a generic framework to support cloud devices under TVM/VTA architecture.

However, it is non-trivial to extend VTA to Cloud devices. Because the original Xilinx HLS VTA core only works on Xilinx Edge FPGA devices, and Cloud devices exposes different communication models (i.e., shared memory between ARM cores and FPGA device for Edge, vs., PCIe between host and FPGA device for Cloud), and different programming models. In this work, we propose to design a unified framework that can be adapted to any OpenCL-compatible hardware accelerators, e.g., FPGA, ASICs, to seamlessly work with the TVM-VTA architecture. Meanwhile, we provide an example of OpenCL-based VTA implementation that has been tested on the Intel's high-end FPGAs.

Proposal

We would like to extend VTA to OpenCL-compatible devices (e.g. Intel Programmable Acceleration Card). In particular, we provide a framework where any OpenCL-compatible devices can be easily integrated. The reason we choose OpenCL-compatible devices are:

In addition to the generic OpenCL framework, as a first attempt for the hardware implementation, we would like to base on Intel Cloud FPGA (e.g. Intel Programmable Acceleration Card) using Intel® FPGA SDK for OpenCL, which has proven portability and scalability for both Intel® Programmable Acceleration (PAC) cards and other custom Intel-FPGA-based acceleration cards. But the overall framework is generic, meaning that any OpenCL-compatible devices can be plugged in with only little extra hardware-specific implementation.

Major works

<img src="https://raw.githubusercontent.com/4paradigm/incubator-tvm/feature/images/docs/resnet50.png" alt="Resnet50" width=300 style="float: centre; margin-left: 50px;" />

Limitations

The RFC has been discussed in https://discuss.tvm.ai/t/rfc-vta-support-for-cloud-devices-opencl-compatible/6676

masahi commented 2 years ago

I assume this is no longer active. Feel free to reopen.