NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go
BSD 3-Clause "New" or "Revised" License
421 stars 94 forks source link

ScopedContext time increases significantly when loading system using boom #7

Closed avishayzanbar closed 7 years ago

avishayzanbar commented 7 years ago

Hello,

Thank you for this great solution, it is very convenient and useful. We have started to use this solution on our trained network and it seems to work fine. But when we test load stress using the offered boom action, we start to get very low responses (after a few calls). When checking closely it seems like the action time of ScopedContext<CaffeContext> context(ctx->pool); seems to increase significantly (up to ~600ms). When we use a single 'curl' call, this action takes less than 1ms.

We use K80 GPU which reaches a maximum of 89% during the boom load.

Do you maybe know of any reason that might affect the system in such case?

avishayzanbar commented 7 years ago

By the way - this is for example what we get for boom -n 20000:

Summary:
  Total:    169.0645 secs
  Slowest:  0.7978 secs
  Fastest:  0.0370 secs
  Average:  0.4214 secs
  Requests/sec: 118.2981

Status code distribution:
  [200] 20000 responses

Response time histogram:
  0.037 [1] |
  0.113 [83]    |
  0.189 [14]    |
  0.265 [16]    |
  0.341 [39]    |
  0.417 [4588]  |∎∎∎∎∎∎∎∎∎∎∎∎
  0.493 [15146] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.570 [25]    |
  0.646 [8] |
  0.722 [2] |
  0.798 [78]    |
flx42 commented 7 years ago

Hello @avishayzanbar,

Maybe boom is sending way too many requests and they start queuing, hence the high overall latency. Could you try with boom -c 4 -n 20000 to limit the number of concurrent requests?

avishayzanbar commented 7 years ago

Hi @flx42 Thank you very much for the response.

You are probably right - after limiting the concurrent requests (4 or even 8) the average time reduced to ~50ms