[inference] Add support for request cancelation

pgayvallet commented 3 days ago

At the moment, the inference APIs (chatComplete and output) don't provide any way to perform cancelation of a running request / call.

Technically, the genAI stack connectors all support passing an abort signal for their stream sub actions.

E.g for genAI: https://github.com/elastic/kibana/blob/9372027e6c74f62d8ffc8d7539bdc2d27d1c0e05/x-pack/plugins/stack_connectors/server/connector_types/openai/openai.ts#L200-L201.

So it should be possible to leverage that to perform cancelation.

The main question here is how do we want to expose this feature.

For normal (non-stream) mode of the APIs, allowing to passing an abort controller as parameter, and passing the controller down to the stack connector call seems like a good option.
For stream mode, it's less obvious. We could follow the same approach, but it's not really the way it's supposed to be done for observables. The obs-friendly way would be to perform cancelation on unsubscription. This would require some work to make the internal observable chain be compatible with that approach (as we're not using a pure observable as a source).

elasticmachine commented 3 days ago

Pinging @elastic/appex-ai-infra (Team:AI Infra)

legrego commented 2 days ago

For normal (non-stream) mode of the APIs, allowing to passing an abort controller as parameter, and passing the controller down to the stack connector call seems like a good option.

👍 seems reasonable to me.

For stream mode, it's less obvious. We could follow the same approach, but it's not really the way it's supposed to be done for observables. The obs-friendly way would be to perform cancelation on unsubscription. This would require some work to make the internal observable chain be compatible with that approach (as we're not using a pure observable as a source).

@pgayvallet It seems like you have a preferred approach, but it's a bit more effort. Am I misreading, or are there additional considerations such as time pressure or feasibility?

pgayvallet commented 2 days ago

There's no time pressure AFAIK.

Regarding feasibility, I'm not 100% sure without doing some testing, but I think we could have the two approaches between stream and non-stream mode cohabitate.

So hopefully it's just about some more effort, yes.

elastic / kibana

[inference] Add support for request cancelation #200757