[ML] Inference API splitting large bulk requests

elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine

https://www.elastic.co/products/elasticsearch

Other

68.51k stars 24.33k forks source link

[ML] Inference API splitting large bulk requests #106184

Open jonathan-buttner opened 3 months ago

jonathan-buttner commented 3 months ago

Description

The inference API supports client side batching by leveraging the input array field. External services implement different limits for batched requests. Cohere limits the text to 96 items. We need to implement functionality to split large requests into smaller ones and reassemble before returning the response to the client.

elasticsearchmachine commented 3 months ago

Pinging @elastic/ml-core (Team:ML)