Description: When cancelling the generation on the UI, it doesn't cancel the generation on the backend and the requests continue to propagate, resulting in a state mismatch that can cause issues.
User Story:
As a developer building a UI on top of the API, I want the ability to cancel the context between the API and LLM backends, so that I can efficiently manage resources and provide a responsive user experience by allowing users to cancel ongoing generation and retry requests or stop inference when needed.
Acceptance Criteria:
[ ] Implement a mechanism in the API to allow cancellation of the context between the API and LLM backends.
[ ] Expose an endpoint or method that accepts a cancellation request from the UI.
[ ] Upon receiving a cancellation request, the API should gracefully terminate the ongoing generation process.
[ ] The API should release any resources allocated for the cancelled request, making them available for subsequent requests.
[ ] Provide clear documentation and usage examples for the context cancellation feature in the API.
putting this here for reference much later:
once this is ready, this open issue may become more of a concern for the frontend:
https://github.com/vercel/ai/issues/1743
Context Cancellation (API)
Type: Feature
Description: When cancelling the generation on the UI, it doesn't cancel the generation on the backend and the requests continue to propagate, resulting in a state mismatch that can cause issues.
User Story: As a developer building a UI on top of the API, I want the ability to cancel the context between the API and LLM backends, so that I can efficiently manage resources and provide a responsive user experience by allowing users to cancel ongoing generation and retry requests or stop inference when needed.
Acceptance Criteria: