continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
18.99k stars 1.62k forks source link

Non-ASCII response from Gemini is partially broken in VScode extension #1492

Closed reosablo closed 4 months ago

reosablo commented 4 months ago

Before submitting your bug report

Relevant environment info

- OS: Windows 11 23H2
- Continue: v0.8.40
- IDE: VSCode 1.91.0-insider
- Model: gemini-1.5-pro-latest

Description

A description of the bug

I'm encountering an issue where responses from Gemini with non-ASCII characters are garbled. This doesn't seem to happen with responses from Groq.

What you expected to happen

Non-ASCII character responses should be displayed correctly, without any garbled characters or replacement characters (like "�").

What actually happened

Currently, non-ASCII characters are being replaced with the replacement character "�". This happens consistently.

For example, the following response is affected:

zh-cn.ts���ァイルの一部ですね。 これは中国語の簡体字で������れ��コードで、音声操作に関するUIのテキストのようです。

日本語のメッセージにする場合、どのような文脈で表示されるかを考慮する必要があります。 例えば、以下のように状況を想定して、より自然で適切な日本語���を検討できます。

Screenshots or videos

screenshot

Possible solutions

I suspect this is because the buffer is being treated as a string, rather than an ArrayBuffer. Since Gemini responses may contain incomplete Unicode bytes, using a string buffer could be causing the corruption.

solution 1: change buffer from string to ArrayBuffer in streamChatGemini function

https://github.com/continuedev/continue/blob/aa18568c6096f8f2df7075ff0f1f711f6f75ed01/core/llm/llms/Gemini.ts#L105

solution 2: use TextDecoderStream instead of TextDecoder in streamResponse function.

https://github.com/continuedev/continue/blob/aa18568c6096f8f2df7075ff0f1f711f6f75ed01/core/llm/stream.ts#L12-L18

yield* response.body.pipeThrough(new TextDecoderStream());

To reproduce

Ask some questions in the chat panel in Japanese.

Log output

No output during chat response.
reosablo commented 4 months ago

I tried TextDecoderStream and it seems to work fine.

I'll create PR.

// core/llm/stream.ts
export async function* streamResponse(
  response: Response,
): AsyncGenerator<string> {
  if (response.status !== 200) {
    throw new Error(await response.text());
  }

  if (!response.body) {
    throw new Error("No response body returned.");
  }

  // `response` doesn't seem to be an instance of globalThis.Response and
  // TypeScript doesn't seem to know ReadableStream has `from` method.
  const stream = (ReadableStream as any).from(response.body);

  // The type of stream is any, not ReadableStream.
  // So we don't need "DOM.AsyncIterable" lib for this line.
  yield* stream.pipeThrough(new TextDecoderStream());
}
sestinj commented 4 months ago

Thanks for the PR @reosablo !