dicecco1 / fpga_caffe

Other
119 stars 51 forks source link

no winograd_pe.cl file #4

Open haisenbao opened 7 years ago

haisenbao commented 7 years ago

Hi @dicecco1 ,

I want rebuild winograd_pe.xclbin, but I can't find winograd_pe.cl in src/caffe/ocl_caffe/convolution/winograd/, can you help update it?

dicecco1 commented 7 years ago

Hi @haisenbao,

Is there a particular reason for wanting a .cl file? When I was initially building this system on SDAccel 2015.3 a lot of the pragmas in HLS C/C++ were not supported in OpenCL so I stuck with building using C/C++. I think now that SDAccel has matured a bit more most pragmas should be supported though.

The easiest way if you absolutely need an OpenCL version rather than C would be to take winograd_pe.c and convert it to OpenCL. This shouldn't be too complicated, the main changes would be to change arrays at the interface to global variables, change buffers to local variables, change memcpy to async_work_group_copy, and change the HLS C pragmas to OpenCL pragmas.

haisenbao commented 7 years ago

Hi @dicecco1 ,

thanks for replying, I want the OpenCL version because I have no SDAccel building environment, only aoc in my machine.

I have another question, I want test the alexnet, or some other net as shown in your paper, how can I do it? I only run test_lrn_layer.testbin or test_convolution_layer.testbin to test lrn_ac, convolution, and so on

dicecco1 commented 7 years ago

Hi @haisenbao,

Okay that makes more sense, some additional effort will be required then because I'm not sure if there's a one-to-one mapping of the pragmas between aoc and SDAccel and some of the host code may need to be changed as well (I'm thinking mostly the host code in XCLProgram Layer and some in common.cpp).

To test Alexnet or other benchmarks from the paper you can use the models found in https://github.com/dicecco1/fpga_caffe/blob/master/models/benchmarks or https://github.com/dicecco1/fpga_caffe/blob/master/models/bvlc_alexnet/deploy_winograd.prototxt for Alexnet testing.

In each case a program layer is used before the 3x3 layers since at the time of the paper only 3x3 convolutions were supported. You can use the standard caffe tool to benchmark the convolution timings or to test the model. You just need to add --ocl=1 to the end of the call and use the FPGA specific models.

haisenbao commented 7 years ago

Hi @dicecco1 ,

thanks very much, the aoc version can work now, I run the alexnet with models/bvlc_alexnet/deploy_winograd.prototxt, but I find only the conv3, conv4 and conv5 layers run with ocl mode, how can I run relu and pooling layers with ocl mode?

dicecco1 commented 7 years ago

Hi @haisenbao

Sorry for the delay, in this version of the build I'm not currently supporting relu/pooling so the options would be to either add the support in or wait ~3 more weeks for me to release a slew of updates to the package that include ReLU/max pooling/fully connected layers.

dicecco1 commented 7 years ago

Actually sorry, I should rephrase; you can use relu/pooling in this current iteration, but it requires reprogramming between layers which isn't really high performance so it should be avoided by using the pipeline layer implementations (which I think might not work in later versions of SDAccel and I'm not sure on how portable they are to aocl). The basic idea behind them is to have several kernels (e.g. conv, relu, pool) in one reconfig region to avoid reprogramming as often.

haisenbao commented 7 years ago

Hi @dicecco1 ,

got it, thanks very much.

haisenbao commented 7 years ago

Hi @dicecco1 ,

What is the format of deploy_winograd.prototxt if I want support relu and pooling layers?

dicecco1 commented 7 years ago

Hi @haisenbao,

The format will be similar but you'll need to insert XCLProgramLayers between Conv, ReLU, and Pooling layers and reprogram along the way, also each layer that you want to use opencl with needs ocl_enable to be set. So a model might look like this (assuming each layer is implemented in a separate bitstream):

XCLProgramLayer->Conv->XCLProgramLayer->ReLU->XCLProgramLayer->Pool

dicecco1 commented 7 years ago

The program_layer branch also has some examples of using the pipelined kernels (conv, pool in the same bitstream), though that branch hasn't been updated as much.