Closed deitry closed 5 years ago
I wouldn't have thought buffering could be a problem, since I simply send all the text in the current document to Uncrustify at once. But maybe NodeJS splits it in multiple parts internally if the data is too large. Does this happen on small files ? I could try to send the document text line by line to ensure that it doesn't split characters in 2.
Files are relatively small. A bit of statistics on the saved file where bag consistentely appears. It is happening on symbol 7050 (each latin and cyrillic symbol counts as 1) if counted from file beginning, line 209 of 256 total, column 74.
I made some experments with formatting only selection in the same file. If I select range from beginning of file up to where the bug appears, replacement occurs in the same place. If I add single ASCII char (like ';') in that range or move beginning on one symbol, bug disappear. If I include another latin char in selection, bug appears again, shifted on one cyrillic symbol. And moreover, replacement persist in the same place if I move end of selection. So I'm almost sure that it comes from some buffering that breaks input in the middle of 2-byte char.
As far as I can see, high likely it origins from here: https://github.com/LaurentTreguier/vscode-uncrustify/blob/d4064d0983aabcc9e90d94769b0e2abd06a4cf82/src/formatter.ts#L93
Maybe I try to prove my guess, though I'm bad at Typescript debugging.
If reopened as single-byte encoding, it is clear happen on bytes 4096, 8192 and so on. I created a simple example on which issue appears: https://gist.github.com/deitry/5711f0a0c562a1a0534b99342f6a32ab
While I try to investigate further when I have some more time, does it work better if you set uncrustify.useReplaceOption
to true
in your config ? This should force the extension to launch uncrustify on the file directly instead of sending it the contents of the document. If uncrustify works fine from the command line, then with this option activated it should work better.
Confirm, setting uncrustify.useReplaceOption
to true
resolves issue.
You were very much correct, that one line you pointed out was exactly where things were broken. I would convert everything sent by Uncrustify to strings, even if they were buffers conataining half a character, which is why some characters were split.
Thank you! But shouldn't we consider this as VS Code bug?
I think it's up to VSCode to choose how it splits the document contents. You simply have to be careful and properly handle this kind of situation. This might even be handled somewhere in Node.JS and not in VSCode.
Ok, thanks for explanation.
This is a floating bug but a recurrent one.
If I format document with this extension, in rare occasions there is a replacement for random symbol in the comments, which are written using cyrillic symbols. If I restore this symbol and run format document again then same symbol will be replaced again. If I make any change to the document, like add or remove some symbol in file (adding or removing a line, for example) and try to format document, there may be other symbol replacement or no replacement at all. If I run format document again on file with already replaced symbol, then more ?? would be added. Running uncrustify directely via terminal works fine, so I guess the problem is within extension.
There is no visible signs of which symbol would be changed, but it is always cyrillic symbol inside comment line. My guess is that this is somehow related to some buffer. Unfortunately, I can't share any code example since it is always unpredicted and appears in a private source files.
This is how it looks before and after formatting: