The inference API supports client side batching by leveraging the input array field. External services implement different limits for batched requests. Cohere limits the text to 96 items. We need to implement functionality to split large requests into smaller ones and reassemble before returning the response to the client.
Description
The inference API supports client side batching by leveraging the
input
array field. External services implement different limits for batched requests. Cohere limits the text to 96 items. We need to implement functionality to split large requests into smaller ones and reassemble before returning the response to the client.