Add abort controller option to pass in along with streaming requests

monacopilot version: 0.15.0
monaco-editor version: 0.52.0

Right now, there is no way to abort LLM requests while they are streaming. I would like to create a system that allows you to to do that.

This would have a couple benefits from this.

Minimal token usage and therefore reduced costs from requests.
You could potentially make onTyping the default trigger for completions, as there would be a lot less resource usage difference between that and onIdle.

If implemented properly, this is what I imagine route.ts for minimal Next.js example to look like.

import {NextRequest, NextResponse} from 'next/server';

import cache from "memory-cache"; // In-memory cache

import {Copilot, type CompletionRequestBody} from 'monacopilot';

const copilot = new Copilot(process.env.ANTHROPIC_API_KEY!);

export async function POST(req: NextRequest) {

  const completionId = JSON.stringify(req)
  const abort = () => cache.get("completion") !== completionId
  cache.put("completion", completionId)

  const body: CompletionRequestBody = await req.json();
  const {completion, error} = await copilot.complete({
    body,
    options: {
        abort       
    }
  });

  if (error) {
    // Handle error if needed
    // ...
    return NextResponse.json({completion: null, error}, {status: 500});
  }

  return NextResponse.json({completion}, {status: 200});
}

Or possibly, you could pass in your own AbortController. There is already some code for this in the source - it's just not possible to pass in a signal yet.

const request = async <
  ResponseType,
  BodyType = undefined,
  MethodType extends Method = Method,
>(
  url: string,
  method: MethodType,
  options: RequestOptions<BodyType, MethodType> = {},
): Promise<ResponseType> => {
  const headers = {
    'Content-Type': 'application/json',
    ...options.headers,
  };

  const body =
    method === 'POST' && options.body
      ? JSON.stringify(options.body)
      : undefined;

  const response = await fetch(url, {
    method: method,
    headers,
    body,
    signal: options.signal, // Abort signal
  });

  if (!response.ok) {
    const data = '\n' + (JSON.stringify(await response.json(), null, 2) || '');
    throw new Error(
      `${response.statusText || options.fallbackError || 'Network error'}${data}`,
    );
  }

  return response.json() as Promise<ResponseType>;
};

What makes this fairly trivial to implement versus a major refactor, is that you must stream responses in order to cancel them. Currently, all LLM responses are returned from an API call statically. In order to stop LLM generation and not incur costs, you have to stream responses.

Therefore, you would have to implement a way to stream and conjoin every completion, instead of the current method of calling a request and storing the output. Here's some code to outline what I mean.


// Instead of this

const response = await request("https://api.openai.com/v1/chat/completions")

// To avoid token costs, you would have to do this (keep in mind this is pseduo-code)

let response = ""
await fetch("https://api.openai.com/v1/chat/completions/stream").then((response) => {
    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }
    return response.body.getReader();
  })
  .then((reader) => {
    const readChunk = () => {
      reader.read().then(({ done, value }) => {
        if (done) return;
        const chunk = new TextDecoder('utf-8').decode(value);
        const completion = JSON.parse(chunk)
        const content = completion.choices[0].text;

        response += content
        abort() // Abort condition from all the way above

        readChunk();
      });
    };

I want to undertake changing the code so this feature is possible, but I wanted to ask if I should beforehand since it would require a major refactor.

Thanks!

arshad-yaseen / monacopilot

Add abort controller option to pass in along with streaming requests #83