biomejs / biome

A toolchain for web projects, aimed to provide functionalities to maintain them. Biome offers formatter and linter, usable via CLI and LSP.
https://biomejs.dev
Apache License 2.0
13.94k stars 422 forks source link

πŸ› `biome format` breaks emojis when used via stdin #455

Closed chrisgrieser closed 10 months ago

chrisgrieser commented 11 months ago

Environment information

CLI:
  Version:                      1.2.2
  Color support:                true

Platform:
  CPU Architecture:             aarch64
  OS:                           macos

Environment:
  BIOME_LOG_DIR:                unset
  NO_COLOR:                     unset
  TERM:                         "xterm-256color"
  JS_RUNTIME_VERSION:           "v20.7.0"
  JS_RUNTIME_NAME:              "node"
  NODE_PACKAGE_MANAGER:         unset

Biome Configuration:
  Status:                       unset

Workspace:
  Open Documents:               0

Discovering running Biome servers...

Server:
  Status:                       stopped

What happened?

When using biome format via stdin, some emojis seem to break. This does not affect all emojis, and it does not affect biome format --write.

(The different emoji sizes is a Wezterm-font-thing, I checked that the issue persists when opening the file in TextEdit.)

Pasted image 2023-09-30 at 13 14 41@2x

Expected result

Emojis not breaking

Code of Conduct

ematipico commented 11 months ago

This was also flagged in the Rome repository, and it seems it wasn't fixed.

Here's a possible reason of the bug: https://github.com/rome/tools/issues/3915#issuecomment-1339388916

ideologism commented 10 months ago

I think that problem is the logic of converting strings to buffers.

Sometimes a Unicode "character" is made up of multiple Unicode scalar values, like the emoji1 in the above example or the ”eΜβ€œ character, but in Rust, char can only represent one unicode scalar.

The conversion logic here turns the second byte into a replacement_character, which causes the problem. I'm not sure why there is a need to make this conversion, but I think the unicode-segmentation might be helpful in solving this problem.

Relevant code: https://github.com/biomejs/biome/blob/f58df063cab89c72589cca6efc5b63e6cd4cc806/crates/biome_console/src/write/termcolor.rs#L150-L153