Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Support model chaining in modelLoad #178

Closed varunsh-xilinx closed 1 year ago

varunsh-xilinx commented 1 year ago

Summary of Changes

Closes #177

Motivation

173 added model chaining through the C++ API. This PR adds the same functionality through the modelLoad API used in the deployment examples and with KServe.

Implementation

I switched to using TOML as the primary format to define the model configuration to more easily support different definition formats. In a protobuf-based format, any new variant would need its own schema. With TOML, it can support a simplified one-model case as well as a more complex ensemble case directly.

A full ensemble definition in TOML looks like this:

[[models]]
name = "invert_image"
platform = "amdinfer_cpp"
id = "base64_decode.so"

[[models.inputs]]
name = "image_in"
datatype = "STRING"
shape = [1048576]
id = ""

[[models.outputs]]
name = "image_out"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "preprocessed_image"

[[models]]
name = "execute"
platform = "amdinfer_cpp"
id = "invert_image.so"

[[models.inputs]]
name = "image_in"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "preprocessed_image"

[[models.outputs]]
name = "image_out"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "inverted_image"

[[models]]
name = "invert_image_postprocess"
platform = "amdinfer_cpp"
id = "base64_encode.so"

[[models.inputs]]
name = "image_in"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "inverted_image"

[[models.outputs]]
name = "image_out"
datatype = "STRING"
shape = [1048576]
id = "postprocessed_image"

Multiple models can be specified using multiple [[models]] tags and similarly, multiple input/output tensors for each model can be added using the appropriate tags. Each output tensor should have a unique ID that gets mapped to one input tensor. In the future, this syntax will support true ensembles but since only linear chains are supported for now, the ID field is effectively unused: all the output tensors of one model are fed to the next. Another addition now is the ID field on the model. Since multiple model files will now exist in the same directory, the ID field for the model must identify which model file matches it. The name field is used to name the endpoints as before. For consistency, the name of the first model should match the name of the repository directory.

In the simple case, with a single model, you can drop the ID fields, drop the [[models]] tags and prefixes so it behaves as the old pbtxt format converted to TOML. The pbtxt format continues to be supported for backwards compatibility though it may be dropped in the future.

Notes

I updated the documentation to use the simpler TOML format (non-ensemble). There are currently no ensemble examples nor a test case for ensembles.

gbuildx commented 1 year ago

Build successful!

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build failed!

varunsh-xilinx commented 1 year ago

Passed internally after a rerun. Failure was due to a used port