Support Pipelined design

Consider the following scenario:

I have a neural network, let's say AlexNet. I break the bigger model into 2 sub-network, one with the convolutional kernels and the second with the fully connected layers. I save both sub-networks in 2 ONNX files. I have a SBC (like Odroid N2+) with both ARM CPU and GPU.

The question is, can I use your framework to run the first sub-network on the CPU and the other on the GPU using memory copy?

Example:

input = input.device(cpu)
out1 = run(subnet1, input).device(cpu)
temp_out1 = copy(out1).device(gpu)
out2 = run(subnet2, temp_out1).device(gpu)

JDAI-CV / DNNLibrary

Support Pipelined design #81