Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Fix making parallel requests using the HTTP client #66

Closed varunsh-xilinx closed 2 years ago

varunsh-xilinx commented 2 years ago

Summary of Changes

Closes #65

Motivation

Allowing parallel requests from HTTP clients allow for higher throughput.

Implementation

The problem with the HTTP client was in Drogon. We need multiple Drogon clients to allow for making parallel requests. Here, I'm arbitrarily using a 16:1 ratio of clients to EventLoops based on the question I asked the Drogon developers. The level of parallelism is exposed at the API level for clients to change the default.

I added a new test to compare the sync and async APIs for making requests to resnet50. There's a problem with the tfzendnn worker that will be a raised as a different issue. The test also exposed a different issue with our test framework because Drogon cannot be restarted in the same process (and it will not be supported). Instead, the HTTP server is started per test suite instead of per test.

The benchmark script hadn't been updated since the Python bindings updates.

gbuildx commented 2 years ago

Build Failed! :(

gbuildx commented 2 years ago

Build Failed! :(

varunsh-xilinx commented 2 years ago

retest this please

gbuildx commented 2 years ago

Build Failed! :(

gbuildx commented 2 years ago

Build Failed! :(

abalasa commented 2 years ago

retest this please

gbuildx commented 2 years ago

Build Failed! :(

varunsh-xilinx commented 2 years ago

retest this please

gbuildx commented 2 years ago

Build Failed! :(

gbuildx commented 2 years ago

Build Passed!