Open saeid93 opened 1 year ago
(copying this from our discussion in Slack)
Hey @saeid93 ,
That’s a great point.
It sounds very similar to something we added recently to pack tensors together. This mainly stems from something that Triton does to optimise gRPC performance. There’re a bit more details in this issue:
https://github.com/SeldonIO/MLServer/issues/48
The gist of it is that instead of populating each data
field separately, we pack together the data
field of each input
(or output
) into a single bytes blob, which then gets added at a top-level field of the protobuf (this is to work around an obscure gRPC performance issue).
You can see most of the logic in the https://github.com/SeldonIO/MLServer/blob/master/mlserver/raw.py file. For tensors, it’s actually very similar to what numpy
does with tobytes
/ frombuffer
- with the exception that it’s implemented without numpy
.
As far as I understand the codec-friendly way of sending image/audio files in Seldon is sending images as NumPy arrays. Following the community slack discussion-1 and discussion-2 I ran a benchmark for audio and image datatypes and maybe this could potentially be improved by making an interface to send byte images directly through grpc. Currently, for sending bytes I do a bit of hardcoding, on the client side I do see:
And on the server side I do, see:
This showed the best performance among all the combinations discussed in here for imge and audio datatypes and maybe potentially could be added natively to MLServer+Seldon stack.