aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
865 stars 162 forks source link

Decoding error on Windows (CRLF?) #137

Closed sergeevabc closed 7 months ago

sergeevabc commented 7 months ago

Windows 7 x64, base64 0.5.2

in.bin.zip

$ base64 in.bin > in.bin.b64
$ base64 -d in.bin.b64 > out.bin
in.bin.b64: decoding error

and

$ coreutils base64 in.bin > in.bin.b64
$ base64 -d in.bin.b64 > out.bin
$ xxhash *.bin
e368ab052d14607e  in.bin
8f8f125c8e4305a3  out.bin
$ coreutils base64 -d in.bin.b64 > out.bin
$ xxhash *.bin
e368ab052d14607e  in.bin
e368ab052d14607e  out.bin
$ busybox base64 in.bin > in.bin.b64 
$ base64 -d in.bin.b64 > out.bin
$ xxhash *.bin
e368ab052d14607e  in.bin
8f8f125c8e4305a3  out.bin
$ busybox base64 -d in.bin.b64 > out.bin
$ xxhash *.bin
e368ab052d14607e  in.bin
e368ab052d14607e  out.bin

Originally posted by @sergeevabc in https://github.com/aklomp/base64/issues/126#issuecomment-1913442453

aklomp commented 7 months ago

I was unable to reproduce this in Linux, even when writev emulation is turned on so that the same codepath is used as on Windows.

However, I was able to reproduce this issue in CI with this branch. Will investigate.

aklomp commented 7 months ago

I hexdumped the base64-encoded file with xxd, and lo and behold, something has expanded newlines (0x0a) to 0x0d 0x0a. Gross.

Now, this could either be some environment setting, or it could be the compiler interpreting \n as \r\n. Which would be extra gross. Will test by changing \n to \x0a.

aklomp commented 7 months ago

Bizarre. So the following snippet, run on an affected platform, proves that pipe redirection is not adding the \r:

echo -ne "hello\nworld\n" | xxd
00000000: 6865 6c6c 6f0a 776f 726c 640a            hello.world.

But then we also have this, where the output from base64 suddenly has \r\n in it:

out/bin/base64 in.bin | xxd
00000000: 4b4c 5576 2f61 5149 2b71 6f41 6641 4541  KLUv/aQI+qoAfAEA
00000010: 6145 3338 5367 4141 4144 4468 4d4f 4869  aE38SgAAADDhMOHi
00000020: 6f49 514f 6f42 4378 422f 4431 2f7a 4e57  oIQOoBCxB/D1/zNW
00000030: 772f 6e6c 6d52 2b44 3636 6564 3772 3439  w/nlmR+D66ed7r49
00000040: 4472 7952 6b6e 4d74 4377 4172 0d0a 4e59  DryRknMtCwAr..NY
00000050: 3163 4141 4141 4151 4439 2f77 7341 717a  1cAAAAAQD9/wsAqz
...

If the pipe redirection does not mangle stdout, then we have to conclude that base64 is actually outputting \r\n.

But the code only uses plain "\n". And a write size of one character, which wouldn't even let the compiler get away with substituting "\r\n".

And this bug just happens on some Windows platforms, not on all Windows platforms. So what the hell is going on? My hypothesis at this point is that it's not the shell, it's not the C code, but it could be the stdout device itself. Maybe it's "clever" in that it decides to change \n to \r\n, but only when \n is the last character in the call to write(). Maybe it's being opened in "line mode" or cooked mode or something like that. Maybe the ONLCR flag is set on the TTY. Something like that.

aklomp commented 7 months ago

If I change '\n' to '\x00 to get zero-delimited lines, the output is not mangled.

If I change '\n' to "\n ", to get lines delimited by a newline and a space, the output is not mangled. This proves that the compiler is not changing \n to \r\n. Rather, it's the shitty execution environment. Geez.

So this could perhaps be fixable in Windows by using something like SetConsoleMode(... DISABLE_NEWLINE_AUTO_RETURN). :vomiting_face:

aklomp commented 7 months ago

Good news: the SetConsoleMode workaround works, and fixes the tests for msys2-cmake-msys. Bad news: still broken on other MSYS2 environments because apparently those do not set enough permissions on the console handle to change the settings. I don't think there's much I can do about that, TBH. On those platforms we'll probably just fall back to outputting \r\n and using -i when decoding.

aklomp commented 7 months ago

@sergeevabc To summarize yesterday's bughunt, the root issue is that your OS has configured stdout to remap \n to \r\n when that character occurs at the end of a line. Rather predictably, if the OS is mangling the program's output, things won't work as they should.

You can solve the issue on your end in any of these ways:

I can try to solve the issue on my end by trying to set DISABLE_NEWLINE_AUTO_RETURN on stdout in Windows, and then restoring the original terminal settings when the program exits. This seems to fix the problem when stdout is a writable handle, but it does not work when stdout is not modifiable, as it seems to be in some runtime environments. In that case, refer to the bullet list above.

I'll merge the console-setting fix and close this issue.

sergeevabc commented 7 months ago

@aklomp, I mentioned two cross-platform utilities, Busybox and Coreutils, which base64 module works fine under my OS. Also I have just checked base64 port from MSYS2 kit as well, it works fine too. So I have three utilities written in different languages that work and one, yours, that doesn't. What conclusion does this suggest?

aklomp commented 7 months ago

This suggests that either they set the DISABLE_NEWLINE_AUTO_RETURN flag on stdout, like my utility will do soon; or they don't write one line at a time; or they use some other trick like opening/reopening stdout in some other mode.

None of this changes the fact that the OS setting is the root cause of the behavior. It's provably mangling the program's correct output.

aklomp commented 7 months ago

Found a fix that works on all Windows platforms in CI and also fixes the decoding (when the decoding happens to contain newlines). It looks like this:

#ifdef WIN
       _setmode(1, _O_BINARY);
#endif

Alternate fixes along the same line using freopen() and SetConsoleMode() do not always work. They occasionally run into permissions errors when the stdout device does not retain enough permissions to change its settings. For some reason, _setmode() does not have that problem.

sergeevabc commented 7 months ago

I confirm that this error is no longer reproducible according to my initial report, i.e. encoding and decoding of binary files on Windows 7 matches the results of other well-known base64 apps.

But to be honest, there is something disturbing about this story. It seems to me that you did not fully realize that we were dealing here with a top-priority bug. And you blamed the popular OS and offered to replace it, not the lack of your own knowledge about that OS peculiarities. Hence the question: what else could you have missed?

(Looking ahead, I will say that I found another fundamental issue.)

You see, base64 is both a simple and a complex thing. Its simplicity lies in the fact that only two operations are performed, encoding and decoding. And the complexity is that these operations must be performed the same everywhere, because base64 is used in e-mail, armored crypto, data URI scheme, etc.

I received several complaints that people could not decode files and we searched for the culprit in various places, but it turned out to be your app. I welcome efforts to optimize things so that they work faster, consume less memory and take up less space, but in the pursuit of optimization, it is important to cherish the foundation.

aklomp commented 7 months ago

As I have told you many times, bin/base64 is a demo program to show how to integrate the library into a program. It is not a fully supported piece of software. It is not intended to be production ready. It is not intended to be cross-platform even! "Top priority bugs", what reality are you even in? Why on earth are you so irresponsible as to use this in production, with third party users even?

I am not going to take attitude from you on this. First of all, I am not your personal helpdesk. I am a volunteer open source programmer who writes this in his own time. You don't pay me. Second of all, you keep ignoring the fact that bin/base64 is a library integration demo. I do not, and have never claimed to, support it on all platforms. Write your own platform-compatible base64 if it's so important to you, then file all the bugs you want against the actual library. Third, I am not responsible for your users, how you do testing and integration with a demo binary is on you. Fourth, it really is your shitty OS, which has not received updates in four years and is at this point crazily unsafe to use in production, that is demonstrably at fault. Don't come crying to me when your OS corrupts a program's output, and especially, don't try to pin the blame on me. Or when you try to, at least contribute something to a solution like CI integration tests instead of sarcasm and beratement.

Great work, well done. This makes me feel so much joy about maintaining this project.

sergeevabc commented 7 months ago

My contribution to the project is sugar-free feedback, revealing flaws that obviously cast a shadow over everything you brag about on the frontpage. The errors I report are reproducible not only on Windows 7, but also on Windows 10 and 11. If you are not ready for feedback, then you should have stayed under the rock and not come out to the public at all. The fact that you do this in your free time and for free does not relieve you of basic human responsibility, just as a musician playing on the street for free cannot take off his pants and poop without at least receiving a reprimand, if not a fine. Cheer up, roll up your sleeves and start improving the solution, rather than whining that you are being offended. However if you didn't come here to create and improve, but to fool around and pollute Github with sloppy code, then it's really my mistake that I mistook you for a developer.

aklomp commented 7 months ago

I wrote a measured response to this, even thanking you, but then you had to put that last sentence in with a sneak edit. Tss.

BurningEnlightenment commented 7 months ago

Maybe you should re-read the license which you agreed to by using this software. Especially the part written in all-caps:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

EDIT: So in short any support or bug fixes are provided as pure courtesy which you are in no way entitled to. If these fixes are important to you, you are free to fork this repository and do the work yourself :wink:

sergeevabc commented 7 months ago

@BurningEnlightenment

Licence is there in case of lawsuits, it's an argument in a legal dispute. But we're on the street. Start pooping there in between free music performance, I won’t listen to your mumbling about the charter of human rights, but will give you a kick in the ass so that you come to your senses. That's the cause and effect relationship.

Look, I reported the bug, then confirmed that it was fixed, and finally shared concerns about what other dangers the app might hold. I did it all in a professional manner without a single word marked as pejorative in dictionaries. So it's not my reputation that's at stake here. But instead of empathy and humility, I see carelessness and hubris. These are the red flags that one is more likely to enter the hall of shame than fame.

BurningEnlightenment commented 7 months ago

But we're on the street. Start pooping there in between free music performance, I won’t listen to your mumbling about the charter of human rights, but will give you a kick in the ass so that you come to your senses.

I really don't get you. There is a fat sign aka License that clearly states "only enter this street if you can stand people pooping all over the street" and somehow you're still getting upset when it happens 😆TBH my usual way of dealing with people who display such a smug, self-entitled and pretentious attitude is to permanently ban them, but apparently @aklomp is way more merciful than I am. So good luck and have fun 😉

sergeevabc commented 7 months ago

@BurningEnlightenment

What makes you think that everyone is allowed to install signs on the street? This is a public space. Once you have uploaded something to the Internet without a lock, it becomes publicly available, and this can have consequences. As long as you sieg heiling under your blanket, Herr Gaßmann, it doesn't matter much, but as soon as you do it on the balcony, the police will soon knock on your door. You can limit your legal liability by attaching a licence, but nothing will save you from meeting face-to-face with someone like Kaloev. In the meantime, continue to giggle and indulge your inner tyrant by banning speakers who explore the uncomfortable theme of responsibility.

aklomp commented 7 months ago

@BurningEnlightenment Sorry about that. I was hesitant at first to block this person because they did contribute real bug reports of some sort, but they crossed a clear line in their toxic remarks to you and forced my hand. Real class act. Anyway, a ban it is.