Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Add model chaining #176

Closed varunsh-xilinx closed 1 year ago

varunsh-xilinx commented 1 year ago

Summary of Changes

Closes #173

Motivation

Chains are simpler versions of model ensembles: a linear graph of workers with one input and one output. Enabling chaining allows users to define pipelines in the server. These may be used for server-side pre- and post-processing steps, for example.

Implementation

This PR implements static chaining: where the chain is defined at load-time. The chain is defined through special load-time parameter: next that has the value of the next worker in the chain. If this parameter is not present at load-time, then the server automatically inserts the Responder as the next node. This information, along with the next worker's allocators, are set on construction. Because the names of the next worker need to be set at load-time, workers in a chain must be loaded in reverse order so the endpoints of the next one are known. Calling the new loadEnsemble method with a single worker is equivalent to calling workerLoad.

Each worker (except for streaming) has been updated to work with chaining. The primary change is the change that each worker now receives one batch and must produce a new batch rather than receiving the batch queue pointer directly. Responding back to the client is all moved to the Responder that functions as the end node for all workers. Common logic (propagating batch and request metadata, extracting batches from the queue etc.) is moved into functions or the base Worker class.

Notes

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build failed!

gbuildx commented 1 year ago

Build successful!