Summary of Changes

Support multiple input tensors
Add async inference to the API

Closes #55

Motivation

Some models may require multiple input tensors and so we can support those use cases. This change also brings the server in alignment with how the inference API is used in KServe.

Implementation

Originally, multiple input tensors in a request were used as pseudo batching from the user: each tensor corresponded to a separate inference request input to a model assuming that it only had one input. This approach is conflicting with the true purpose of multiple input tensors which is to support models that actually require multiple input tensors. Since all the current tests use models with one input tensor, they should have one input tensor.

The modelInfer API is blocking so to allow for batching in the new regime, I've added an asynchronous counterpart to KServe's modelInfer API that returns a future in C++. This may also work in Python with Pybind11 but the approach taken for Python examples for now is to use the multiprocessing library to make inferences in parallel in the examples.

Xilinx / inference-server

Properly support multiple input tensors #61

Summary of Changes

Motivation

Implementation