continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
17.97k stars 1.4k forks source link

Wrong character encoding in responses #904

Open CosmicMac opened 7 months ago

CosmicMac commented 7 months ago

Before submitting your bug report

Relevant environment info

- OS: Windows 10 & 11
- Continue: 0.0.33
- IDE:Jetbrains (C-Lion/PHP-Storm/DataGrip, latest versions, new UI)
- Server: docker ollama:latest

Description

Extended characters in responses are badly encoded (eg. "é" instead of "é"). Encoding is OK in direct responses from ollama when prompting with terminal.

To reproduce

1/ Select any gemma model 2/ Prompt "Translate elegant to french"

Log output

No response

sestinj commented 7 months ago

I tried this in both VS Code and Intellij and found that the encoding looked as expected (though Gemma gives interesting answers).

I'm wondering if this might be the model literally outputting "é" due to something it saw in its dataset. If you say something like "repeat after me: 'é'", can you get it to output the correct encoding?

Screenshot 2024-02-29 at 2 26 13 PM Screenshot 2024-02-29 at 2 22 28 PM
CosmicMac commented 7 months ago

gemma on acid :)

Unfortunately same problem with the repeat prompt: continue01

A quick test in console: continue02

CosmicMac commented 7 months ago

For me, it looks like a double utf-8 encoding. As I'm using a french OS maybe there is auto encoding occurring before forced encoding (or the other way round)? It would explain why you can't reproduce the glitch on your system.

sestinj commented 7 months ago

Ah this makes sense. Is this built into the OS, or might there be a setting that I could change in order to simulate this?

CosmicMac commented 7 months ago

Unfortunately I have no idea :(