Closed rlleshi closed 1 day ago
Can you provide your service definition here? obv you should remove anything that is sensitive there?
But this seems, strange.
Hey, ty for the quick follow-up.
Sure:
@bentoml.service(
traffic={
'timeout': 10,
'concurrency': 15,
},
metrics={
'enabled': True,
},
workers=2,
)
class Service4(BentoWrapper):
def __init__(self) -> None:
super().__init__()
# do stuff
@bentoml.task(
batchable=True,
batch_dim=(0, 0),
max_batch_size=15,
max_latency_ms=1000)
def postprocess(self,
inputs: list[BatchInput]) -> torch.Tensor:
Also updated the issue with some more info regarding the task execution from the main service.
What does this BentoWrapper class do?
Ah, I guess the name has to be improved there. It's just a base class with common functionalities shared by the services.
When you implement batch endpoints with sync methods, please be noted don't call the endpoint within another sync API method, because there is a default thread limit of 1.
Try increasing the number by setting threads=N
in @service()
decorator to see if it improves the performance.
@frostming ah I see. Well then perhaps the documentation here should be updated accordingly (namely that this is not recommended). I wanted to speed up the processing of my distributed service and tasks seemed like a way to do it (apart from async services) in bentoml.
Regarding increasing the number of threads, would you mind pointing me to the documentation where this is described please? I can't find anything of substance here or by a general search in your documentations.
@frostming @aarnphm kind reminder
If you could provide some documentation on threads it would be great (you said that a worker uses one by default, what happens if we assign more than one threads, I mean why isn't it recommended to do that?).
Also, if I cannot use tasks, what else would you recommend in this case? So, again, I have a composite service that orchestrates up to 4 sub-services. If I manage to run the fourth sub-service async then I will be able to deduct its execution time from the overall service execution time. So it will non-trivially speed up the overall service.
AFIK bento offers tasks & async services for this purpose. I refrained from using async services because the main service is a sync one and the other 3 services will happen in a sync fashion. It's only the fourth service that should happen async. Hence I went for tasks but now it seems that even bento tasks aren't quite appropriate.
If I go for the async approach can I simply use asyncio.run()
for the fourth service from the main service? I'm guessing not? I guess one way, as described in the docs here, would be to make the main service an async one, then I use sub-service 1-3 as async after converting them (while the services in reality are sync), and finally use the fourth async sub-service as normally.
Otherwise, any other ideas?
@frostming @aarnphm another kind reminder :)
Regarding increasing the number of threads, would you mind pointing me to the documentation where this is described please? I can't find anything of substance here or by a general search in your documentations.
Lacking docs for that part
The threads limit only applies to the sync endpoint, and you shouldn't do concurrent actions inside of it. Just make it an async method and it will work
I wanted to refrain from doing that since most of my services are working sync but I guess there's no other choice. Thanks for the followup anyway!
Describe the bug
Task definition:
Normally a batch of 15 image frames will run in about 0.9s when executed synchronously. From the docs I discovered that bentoml offers running background tasks out of the box, which made me go for this instead of executing the sub-service asynchronously (as this would be the only async sub-service of the main service).
However, the execution gets progressively slower after each batch. Not only that but it seems to actually block the rest of the synchronous execution (at least for some time). Regarding the latency, it starts off executing in milliseconds (as it should), but eventually by the time we get to the final batch these are the execution times:
that is more than 10 seconds for a batch that should take ~0.9s to process, so more than an order of magnitude! As a result of this the entire execution time more than doubles.
My main service looks sth like this:
So by the time the next task starts executing there is at least 2.5s of execution time (excluding overhead), which should be more than enough for the previous task to have finished their execution.
Regarding the execution of the task in the main service it looks sth like this:
and then accessing it (this happens after service 1-3 finished executing)
Am I missing something? Also, other than the above-linked documentation page, is there more documentation I could get on tasks in bentoml?
To reproduce
No response
Expected behavior
The task should execute in the background & not block the main flow, and therefore it should in the end it should speed up the overall execution instead of slowing it down.
Environment
bentoml: 1.3.3 python: 3.9.0 platform: ubuntu 22.04, 6.5.0-45-generic