bentoml / BentoML

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
https://bentoml.com
Apache License 2.0
7.08k stars 785 forks source link

feature: Arrow table input/output #4119

Open judahrand opened 1 year ago

judahrand commented 1 year ago

Feature request

I think that it would be great to add Arrow Tables as an IO type for BentoML endpoints. This would be particularly beneficial for the GRPC server where the Arrow IPC format (not Parquet) could be used directly by dumping the data in the serialized_bytes field of the Protobuf message.

Motivation

Parquet is currently used to move Pandas DataFrames around in BentoML and is a great storage format but it doesn't maintain all of the great properties of the in-memory Arrow format (because it is designed as an on-disk format) like strict register alignment. It maaay reduce on-the-wire data size but will almost certain increase serialization/deserialization time.

I believe that this addition would:

Other

No response

parano commented 11 months ago

Hi @judahrand - we are working on a new iteration of IO Descriptor in BentoML and it will come with Arrow support! cc @frostming

judahrand commented 11 months ago

Does the code that's in development exist somewhere? I'd be interested in having a read.

frostming commented 11 months ago

Does the code that's in development exist somewhere? I'd be interested in having a read.

Sure, #4240

judahrand commented 7 months ago

@parano Did Arrow support ever get added?