In the execution phase, HCL creates a compiled function as a handle to FPGA accelerator. Users pass the input data in forms of numpy arrays to this compiled function to invoke the host program (generated by HCL in the build phase), and the host program then launches the FPGA kernel function.
s = hcl.create_scehdule()
f = hcl.build(s, target)
# invoke OpenCL host program and FPGA kernel
f(input_a, input_b)
Right now, to pass the numpy arrays to host program, HCL runtime creates a inter-process shared memory between the host program and itself, and write the input data into this memory.
# HCL runtime gets input from users
input_a, input_b = get_inputs()
# generate shared mem ids
id_a, id_b = create_shared_mem()
# update the IDs in the host program
write_id("host.cpp", [id_a, id_b])
# save the input data to shared memory
write_into_shared_mem({id_a: input_a, id_b: input_b})
# compile and invoke the host program
compile_and_run("host.cpp")
When the host program is executed, it reads data from the shared mem, and then pass it to FPGA kernel
auto input_a = new int[];
read_from_shared_mem(input_a, ID_a)
FPGA_kernel(input_a, input_b)
However, this workflow relies on the OS memory management and does not have full control of the shared buffer. The shared memory might be invalidated when the host program reads data from it, which will cause system error, as mentioned in #241
Solution
Like mentioned by Hongzheng, we should just write these inputs into a header file, and let the host program reads the header file instead of using the shared memory. This method brings many advantages:
HCL runtime system has full control of the data transfer process
Users can change the header file manually without HCL runtime involved (useful for debugging)
Problem Description
In the execution phase, HCL creates a compiled function as a handle to FPGA accelerator. Users pass the input data in forms of numpy arrays to this compiled function to invoke the host program (generated by HCL in the build phase), and the host program then launches the FPGA kernel function.
Right now, to pass the numpy arrays to host program, HCL runtime creates a inter-process shared memory between the host program and itself, and write the input data into this memory.
When the host program is executed, it reads data from the shared mem, and then pass it to FPGA kernel
However, this workflow relies on the OS memory management and does not have full control of the shared buffer. The shared memory might be invalidated when the host program reads data from it, which will cause system error, as mentioned in #241
Solution
Like mentioned by Hongzheng, we should just write these inputs into a header file, and let the host program reads the header file instead of using the shared memory. This method brings many advantages: