cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 93 forks source link

Insider Backend in HCL #292

Open hecmay opened 3 years ago

hecmay commented 3 years ago

Programming model in Insider

The input data is written to a virtual file descriptor. The users only need to create a vread() statement in the host program to read output result (processed and returned by FPGA) from the same descriptor. Here is an example host program for KNN in Insider. Insider's ISC driver and firmware automatically intercepts the input file information.


void prediction(void) {
  const char *virt_path = reg_virt_file("/mnt/centos/knn_data.txt");
  int fd = vopen(virt_path, O_RDONLY);
  while (cur = vread(fd, buf, BUF_LEN)) {
    // processing returned output here...
  }
}

Namely, the input is moved to storage unit (SU) explicitly by the users, and then moved from SU to FPGA automatically under the hood.

On the FPGA side, there are three types of channels for FPGA kernel interface in Insider -- 1) reading large-size input from SU, 2) for writing output to host and 3) getting small-size params from host, respectively (as shown in the code snippet below). These three arguments are required by the Insider, and will be automatically connected to SU and host by the compiler. In other words, when using Insider, we only have two channels to send in data to FPGA, and one channel for output.

void app_knn(ST_Queue<APP_Data> &app_input_data,
             ST_Queue<APP_Data> &app_output_data,
             ST_Queue<unsigned int> &app_input_params) {
    // main body...
}

HCL Interface for Insider Backend

To accommodate the insider backend, we need to provide appropriate abstraction. I will prefer to use the fluent programming style (i.e. chained .to() APIs described in 1st bullet point in the snippet below) to express the logic. For the host-device interface, we have to pack multiple inputs into a single struct, and unpack it on the FPGA.

Here is an example of KNN to show the proposed interface. We can add a new option (i.e. vfile=/path/) for .to()to specify the path of the virtual file, and then declare that the input will be consumed by the FPGA (i.e. ISC controller using PIM's terminology). The rest is basically the same as we already have.

def knn(training_data, input_feat):
    # algorithms defined here
    # return predication
    return pred

s = hcl.create_schedule()
p = hcl.platform.insider

# 1. Input data is moved to storage unit 
# instead of FPGA, and automatically streamed into FPGA during runtime 
s.to(training_data, p.drive, vfile="/mnt/user/training.dat").to(p.FPGA)

# 2. Input features (small size params) 
# are send from CPU to FPGA through FIFOs
s.to(input_feat, p.FPGA)

# 3. Prediction result streamed back to host
s.to(knn.pred, p.host)
zhangzhiru commented 3 years ago

@Hecmay how do we simulate this code? We won't be able to put data under /mnt/user on the CPU server that runs the HCL simulation. Does Insider have a CPU simulation mode?

hecmay commented 3 years ago

Yeah. Insider has CSIM and co-SIM modes. We can provide an interface to invoke Insider's simulation tools.

p = hcl.platform.insider
p.config(mode="csim|cosim")
zhangzhiru commented 3 years ago

s.to(training_data, p.drive, vfile="/mnt/user/training.dat").to(p.FPGA)

We need to think critically about what is the right programming interface here. What is the first .to() doing? Isn't the file already stored on disk?

Also, how do we specify the data type stored in training.dat? I assume we have a fixed-width channel between the drive and the accelerator. In that case, does Insider do the type conversion automatically?

hecmay commented 3 years ago

In the HCL user end, the input training data is in-memory (e.g. as numpy arrays), we need to dump it into the file (stored in the disk) first, and then to FPGA. If the data is already in the file, I think we can also define it as

s.to(training_data, p.FPGA, vfile="/mnt/user/training.dat")

About the data type, this is the required interface data type in Insider:

struct APP_Data {
  ap_uint<512> data;
  unsigned short len;
  bool eop;
};

Basically multiple (unsigned integer) values in training.dat are packed into 512 bit and passed to FPGA automatically by Insider runtime. Users need to do the unpacking manually in the kernel code. In the provided examples, I did not see any floating or fixed-point data type used in for the interface.

hecmay commented 3 years ago

Made some changes to the previous proposal:

  1. Use placeholder to represent the file descriptor, so that the .to primitive looks more concise
  2. Add an interface to configure the BW and interconnect latency of Insider.
def knn(training_data, input_feat):
    # algorithms defined here
    # return predication
    return pred

# 1. Define placeholders. The in-storage data should be stored in the file before hand
training_data = hcl.placeholder((size,), source="/home/users/input.txt")
params = hcl.placeholder((size, ))

# 2. Configure the simulation parameters 
p = hcl.platform.insider
p.config(bandwidth=12, latency=300)

# 3. Move data. Large data is moved from disk to FPGA
# while smaller params for moved from host to FPGA
s = hcl.create_schedule(kernel, [training_data, params])
s.to(training_data, p.drive).to(p.FPGA)
s.to(params, p.FPGA)