LaurentTreguier / vscode-uncrustify

Code format using uncrustify
66 stars 10 forks source link

Random symbol is replaced with ?? when formatting document sometimes #51

Closed deitry closed 5 years ago

deitry commented 5 years ago

This is a floating bug but a recurrent one.

If I format document with this extension, in rare occasions there is a replacement for random symbol in the comments, which are written using cyrillic symbols. If I restore this symbol and run format document again then same symbol will be replaced again. If I make any change to the document, like add or remove some symbol in file (adding or removing a line, for example) and try to format document, there may be other symbol replacement or no replacement at all. If I run format document again on file with already replaced symbol, then more ?? would be added. Running uncrustify directely via terminal works fine, so I guess the problem is within extension.

There is no visible signs of which symbol would be changed, but it is always cyrillic symbol inside comment line. My guess is that this is somehow related to some buffer. Unfortunately, I can't share any code example since it is always unpredicted and appears in a private source files.

This is how it looks before and after formatting: Screenshot_20190805_130622 Screenshot_20190805_130633

LaurentTreguier commented 5 years ago

I wouldn't have thought buffering could be a problem, since I simply send all the text in the current document to Uncrustify at once. But maybe NodeJS splits it in multiple parts internally if the data is too large. Does this happen on small files ? I could try to send the document text line by line to ensure that it doesn't split characters in 2.

deitry commented 5 years ago

Files are relatively small. A bit of statistics on the saved file where bag consistentely appears. It is happening on symbol 7050 (each latin and cyrillic symbol counts as 1) if counted from file beginning, line 209 of 256 total, column 74.

I made some experments with formatting only selection in the same file. If I select range from beginning of file up to where the bug appears, replacement occurs in the same place. If I add single ASCII char (like ';') in that range or move beginning on one symbol, bug disappear. If I include another latin char in selection, bug appears again, shifted on one cyrillic symbol. And moreover, replacement persist in the same place if I move end of selection. So I'm almost sure that it comes from some buffering that breaks input in the middle of 2-byte char.

As far as I can see, high likely it origins from here: https://github.com/LaurentTreguier/vscode-uncrustify/blob/d4064d0983aabcc9e90d94769b0e2abd06a4cf82/src/formatter.ts#L93

Maybe I try to prove my guess, though I'm bad at Typescript debugging.

deitry commented 5 years ago

If reopened as single-byte encoding, it is clear happen on bytes 4096, 8192 and so on. I created a simple example on which issue appears: https://gist.github.com/deitry/5711f0a0c562a1a0534b99342f6a32ab

LaurentTreguier commented 5 years ago

While I try to investigate further when I have some more time, does it work better if you set uncrustify.useReplaceOption to true in your config ? This should force the extension to launch uncrustify on the file directly instead of sending it the contents of the document. If uncrustify works fine from the command line, then with this option activated it should work better.

deitry commented 5 years ago

Confirm, setting uncrustify.useReplaceOption to true resolves issue.

LaurentTreguier commented 5 years ago

You were very much correct, that one line you pointed out was exactly where things were broken. I would convert everything sent by Uncrustify to strings, even if they were buffers conataining half a character, which is why some characters were split.

deitry commented 5 years ago

Thank you! But shouldn't we consider this as VS Code bug?

LaurentTreguier commented 5 years ago

I think it's up to VSCode to choose how it splits the document contents. You simply have to be careful and properly handle this kind of situation. This might even be handled somewhere in Node.JS and not in VSCode.

deitry commented 5 years ago

Ok, thanks for explanation.