castwide / vscode-solargraph

A Visual Studio Code extension for Solargraph.
Other
424 stars 25 forks source link

"invalid byte sequence in US-ASCII" on latest version #37

Closed garyking closed 5 years ago

garyking commented 6 years ago

After I open a Ruby project and use "Go to symbol in workspace", I get this in the Output panel:

[Error - 13:43:44] Server initialization failed.
  Message: invalid byte sequence in US-ASCII
  Code: -32603 

And this screenshot:

screenshot

Needless to say, due to these errors, the extension doesn't work.

castwide commented 6 years ago

What OS are you using? Is there a particular file that gives you this error? The most likely cause is a literal string with escaped characters, e.g., "\xcf".

garyking commented 6 years ago

I'm using a Mac.

There's definitely a file in my workspace that is giving this error due to a non-ASCII character, but even if I open a blank file in the problematic workspace, and then use "Go to symbol in workspace", then the error occurs. So the extension is probably trying to index all Ruby files in my workspace, and then it comes across a problematic character. But it doesn't tell me where to look for this.

Otherwise, I just tried "Symbol in workspace" in another project, and it's honestly amazing. It goes to symbols in files that aren't even opened! If I recall, this doesn't even happen in either JS or Python for VSC, and both are officially supported languages in VSC.

garyking commented 6 years ago

I scanned all my files for non-ASCII characters, and found them. They are mostly emdashes (—). If I add #encoding: utf-8 to the top of each file that contains non-ASCII characters, then Solargraph error does not occur, but "Symbols in workspace" does not show any results. I had to delete the problematic files for it to work, but at that point, it does indeed work.

castwide commented 6 years ago

Thanks!

The parser is supposed to handle non-ASCII characters gracefully, but there must be an exception somewhere. It's also been reported here: https://github.com/castwide/solargraph/issues/33

I haven't been able to reproduce the error on Windows, so I'm going to try again on MacOS later. At the very least, I'll modify the error so it reports which file triggered it.

Edit: Thanks for the update. That gives me a good starting point.

TheTharin commented 6 years ago

I'm having this issue as well. Can't just add #encoding: utf-8 to every file. That's not my own project and it won't go through code review :) I'm running OS X 10.13.3

stereoscott commented 6 years ago

I ran into this error well; would be great to see the file that caused the error. (I'm on a Mac.)

[Error - 14:37:17] Server initialization failed.
  Message: invalid byte sequence in US-ASCII
  Code: -32603 
solargraph -v
0.18.2

VS Code Version 1.22.1 (1.22.1), Ruby Solargraph v 0.14.1

castwide commented 6 years ago

This is fixed in the gem's master branch. I'll publish version 0.18.3 in the next day or so.

More information: https://github.com/castwide/solargraph/issues/33

castwide commented 6 years ago

Version 0.18.3 is published.

dbechrd commented 6 years ago

@castwide I'm having this issue in Solargraph 0.22.0. It would be nice if the error could at least tell me which file contained the invalid byte sequence.

alphabt commented 6 years ago

+1 still repro with 0.22.0 on macOS

jerrywdlee commented 6 years ago

Not the best way but effectual to me. For my environment, LC_CTYPE is null and LANG is ja_JP.UTF-8. So I just add export LC_CTYPE=$LANG to the ~/.bash_profile and restart vscode. Then it works.

castwide commented 6 years ago

Gem version 0.23.5 includes the names of files in error messages when character encoding exceptions and other parser-related problems occur.

cmazakas commented 6 years ago

I'm getting these errors as well. Which is odd because I've been using Solargraph forever and this is the first time it's had this error in a file it's opened successfully many times.

~ exbigboss$ solargraph -v
0.25.1

Edit:

This is 100% a regression in version 25. I downgraded back to 24 and everything is working normally again.

castwide commented 6 years ago

Reproduced on MacOS. I'm looking into it. Any help or additional information that anyone can provide is appreciated.

castwide commented 6 years ago

I may have a solution to this one. The next version of the gem will read files as binary and enforce UTF-8 encoding in memory. I've tested it on files with a few different character sets, including Unicode graphemes.

There are still a few quirks in certain cases, like source code with emoji variables and similar crazy stuff. It'll parse, but you might get some strange behavior while editing it. Emojis in comments don't appear to have any side effects.

I expect to release a new version of the gem by the end of the week. In the meantime, the changes are in the castwide/solargraph master branch if anyone wants to test them.

castwide commented 6 years ago

Gem version 0.26.0 is published with the latest encoding fix.

liqites commented 6 years ago

@castwide tested on vscode insider 1.28.0 with solargraph gem version 0.27.1 problem solved

jaredcwhite commented 3 years ago

In case anyone finds this now, I had a separate weird problem (setting up VSCode on an M1 Mac) where Solargraph+Bundler was choking on a gemspec file and throwing the "invalid byte sequence" error. Turned out I needed to add export LANG=en_US.UTF-8 to my ~/.zshrc file and reboot VSCode, then the Solargraph extension started working. Go figure!