Lightning-Universe / DiffusionWithAutoscaler

DiffusionWithAutoscaler
Apache License 2.0
28 stars 5 forks source link

Add while loop batching #16

Closed tchaton closed 1 year ago

tchaton commented 1 year ago

This PR adds an example of inference as a loop for stable diffusion.

Computed on RTX 3090:

Type Max Batch Size Number of users (locust) Average (ms)
Grouped Batch 6 6 8868
Streamed Batch (naive) 6 6 9442
Streamed Batch (global state) 6 6 8584

And the following benchmarks on T4.

Type Max Batch Size Number of users (locust) Average (ms) Number of requests
Grouped Batch 1 1 5061 1353
Grouped Batch 2 2 11393 1599