Allowing parallel requests from HTTP clients allow for higher throughput.
Implementation
The problem with the HTTP client was in Drogon. We need multiple Drogon clients to allow for making parallel requests. Here, I'm arbitrarily using a 16:1 ratio of clients to EventLoops based on the question I asked the Drogon developers. The level of parallelism is exposed at the API level for clients to change the default.
I added a new test to compare the sync and async APIs for making requests to resnet50. There's a problem with the tfzendnn worker that will be a raised as a different issue. The test also exposed a different issue with our test framework because Drogon cannot be restarted in the same process (and it will not be supported). Instead, the HTTP server is started per test suite instead of per test.
The benchmark script hadn't been updated since the Python bindings updates.
Summary of Changes
Closes #65
Motivation
Allowing parallel requests from HTTP clients allow for higher throughput.
Implementation
The problem with the HTTP client was in Drogon. We need multiple Drogon clients to allow for making parallel requests. Here, I'm arbitrarily using a 16:1 ratio of clients to EventLoops based on the question I asked the Drogon developers. The level of parallelism is exposed at the API level for clients to change the default.
I added a new test to compare the sync and async APIs for making requests to resnet50. There's a problem with the tfzendnn worker that will be a raised as a different issue. The test also exposed a different issue with our test framework because Drogon cannot be restarted in the same process (and it will not be supported). Instead, the HTTP server is started per test suite instead of per test.
The benchmark script hadn't been updated since the Python bindings updates.