cli / go-gh

A Go module for interacting with gh and the GitHub API from the command line.
https://pkg.go.dev/github.com/cli/go-gh/v2
MIT License
322 stars 45 forks source link

`asciisanitizer.Sanitizer` mishandled the `�` unicode character #127

Closed yin1999 closed 10 months ago

yin1999 commented 10 months ago

We have encountered an error when using the GitHub cli to fetch commits in an MDN repository. And I found this error is coursed by the sanitizer which is used by GitHub cli.

So I created a demo to reproduce the problem:

the plain text to transform:

�, plain text

When we read the plain text, and use transform with the the sanitizer , we would got an error:

image

But this should be the correct text. I found the error is returned here.

So I read the signature of utf8.DecodeRune. It may also return utf8.RuneError if the bytes are correctly decoded. And if there does be a decode error, it will return (RuneError, 0) or (RuneError, 1).

So we can't judge whether there is a decoding error just based on the first value returned, like the text I used above, which uses this unicode character. The sanitizer mishandled it.

samcoe commented 10 months ago

@yin1999 Thanks for raising this issue. I was able to reproduce it, the sanitizer is in fact mishandling properly encoded \uFFFD unicode characters that are coming from the API.