HumanSignal / label-studio-sdk

Label Studio SDK
https://api.labelstud.io
Apache License 2.0
99 stars 61 forks source link

Add support of async client in sdk #65

Open kaustuk opened 2 years ago

kaustuk commented 2 years ago

Currently, the version of SDK doesn't have the support for creating the async client. It would be great to have this support which can help in optimizing the code since all calls are IO-bound.

PS: I would like to contribute a feature.

makseq commented 2 years ago

@kaustuk really cool idea! How would you solve it? threads? how to retrieve and save responses?

ScholliYT commented 1 year ago

I was also wondering if that would be possible with aiohttp or similar. However, we noticed that Label Studio itself is the biggest bottleneck for us. Our Label Studio instance does not seem to handle any requests in parallel.

makseq commented 1 year ago

If you use docker-compose from the latest LS version, it should use uswgi by default, so LS shouldn't be a bottleneck anymore.

What exact SDK calls do you want to call asynchronously?

ScholliYT commented 1 year ago

@makseq You're correct. Apparently, I did my benchmark on an older version of Label Studio 3 weeks ago. I just retried with the latest version from GitHub using uWSGI and the results show a speedup with parallel requests as expected. I attached my benchmark results below.

image As sou can see with 1 connection I get about 20 req/sec (see second last line). If I increase that to 2 connections I get about 43 req/sec which is a speedup by about 2x as expected.

makseq commented 1 year ago

Great findings! However, it's still unclear for me how we can turn sdk to async and why do we need it exactly?

ScholliYT commented 1 year ago

In our case, we are requesting meta data for several datasets to show them on a dashboard. That is like 10 requests to /api/projects/X which are handled sequentially right now. Therefore, loading this data takes ~500ms. If the SDK would support asynchronous calls we could get the data for all of them concurrently, reducing the time to 50ms (in theory). To achieve this without async support from the SDK we would have to reimplement the specific endpoints with async or use something like multitreading/processing.