aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
865 stars 162 forks source link

v0.5.1 breaks `base64 -d` on Alpine Linux (musl libc) #126

Closed jirutka closed 7 months ago

jirutka commented 9 months ago

base64 v0.5.0 works fine, v0.5.1 is broken on Alpine Linux Edge (both with gcc and cmake):

cmake -B build -G Ninja \
        -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_INSTALL_PREFIX=/usr \
        -DBUILD_SHARED_LIBS=ON \
        -DBASE64_BUILD_CLI=ON \
        -DBASE64_BUILD_TESTS=ON \
        -DBASE64_WITH_OpenMP=OFF \
        -DBASE64_WITH_AVX512=OFF \
        -DBASE64_WITH_AVX2=OFF \
        -DBASE64_WITH_AVX=OFF
cmake --build build

 ctest --test-dir build --output-on-failure
Test project base64-0.5.1/build
    Start 1: test_base64
    Start 2: benchmark
1/2 Test #1: test_base64 ......................   Passed    0.00 sec
2/2 Test #2: benchmark ........................   Passed    1.04 sec

100% tests passed, 0 tests failed out of 2
echo foo | ./build/bin/base64 | ./build/bin/base64 -d
./build/bin/base64: stdin: decoding error
execve("./build/bin/base64", ["./build/bin/base64", "-d"], 0x7ffc6def0e48 /* 19 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fc7571dcb08) = 0
set_tid_address(0x7fc7571dcf70)         = 32459
brk(NULL)                               = 0x55b6d3d97000
brk(0x55b6d3d99000)                     = 0x55b6d3d99000
mmap(0x55b6d3d97000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x55b6d3d97000
open("build/bin/libbase64.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=67656, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 69632, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc75712a000
mmap(0x7fc75712b000, 40960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0x7fc75712b000
mmap(0x7fc757135000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0xb000) = 0x7fc757135000
mmap(0x7fc757139000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xe000) = 0x7fc757139000
close(3)                                = 0
mprotect(0x7fc757139000, 4096, PROT_READ) = 0
mprotect(0x7fc7571d9000, 4096, PROT_READ) = 0
mprotect(0x55b6d2d08000, 4096, PROT_READ) = 0
mmap(NULL, 1048596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc757029000
mmap(NULL, 1398137, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc756ed3000
readv(0, [{iov_base="Zm9vCg==\n", iov_len=1048575}, {iov_base="", iov_len=1024}], 2) = 9
readv(0, [{iov_base="", iov_len=1048566}, {iov_base="", iov_len=1024}], 2) = 0
writev(2, [{iov_base="./build/bin/base64: stdin: decod"..., iov_len=42}, {iov_base=NULL, iov_len=0}], 2./build/bin/base64: stdin: decoding error
) = 42
munmap(0x7fc757029000, 1052672)         = 0
munmap(0x7fc756ed3000, 1400832)         = 0
close(0)                                = 0
close(1)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++
aklomp commented 9 months ago

Sorry, fat-fingered the post button.

aklomp commented 9 months ago

Hi, this is caused by a change in behavior of the bin/base64 utility since 0.5.0. In order to be bug-compatible with "standard" base64, it now always outputs a newline after the output. Since \n is not a valid base64 char, an error is thrown when the output is piped back into the decoder.

The fix is to run it as echo foo | ./build/bin/base64 -w0 | ./build/bin/base64 -d. The -w0 option suppresses line-buffered output.

jirutka commented 9 months ago

I hope you don’t consider this totally unexpected and confusing behaviour a feature. Since newline is not a valid character in base64, you can easily strip the trailing newline in the input to the decoder.

aklomp commented 9 months ago

The base64 utility is intended as a demo project for the base64 library, to show how to integrate the library into a codebase. I don't consider it a core feature. Before the last release, the base64 library was very simple and behaved as a dumb pipe. stdout could be piped to stdin and all was well.

But over the years a few issues have been raised (#8 is the one I could find quickly) that were caused by the fact that the behavior was not identical to the system base64. So in issue #115 I decided to fix that complaint and go for bug compatibility. So now the program always emits a trailing newline and always line-breaks the output by default. Gross, I agree, because I like dumb pipes, but it upholds the principle of least surprise.

But, and this is where you have a point, it seems like I didn't do a perfect job, because upon checking, the system base64 does seem to eat newlines, as demonstrated by:

echo foo | base64 | base64 -d

So I guess this issue is now about stripping newlines from the input.

aklomp commented 9 months ago

Possible connection to #33.

sergeevabc commented 7 months ago

Windows 7 x64 user here. Looking forward to test a fixed version.

aklomp commented 7 months ago

I pushed an issue126 branch which fixes this issue, but in benchmarks it's about twice as slow as the system base64's decoder on the same large input file. Probably because the algorithm is really naive and checks the input bytewise for \n.

I guess I can merge it because a working but slow program is preferable over a non-working program, but it sucks to admit defeat to the system base64 :)

sergeevabc commented 7 months ago

@aklomp, you mean I have to download that branch and compile it myself? o_O

aklomp commented 7 months ago

@sergeevabc Well yes, how else do you get a binary? This project does not publish any binaries.

But anyway, I think I'll merge the branch today because it fixes the issue, and create a follow-up issue to deal with the slow decoding. The fix works, it's pretty trivial and I tested it locally. It's just quite slow because it scans the input twice.

aklomp commented 7 months ago

Found the speed regression. Don't write to stdout in the inner loop :) Anyway, now the timing is competitive again. Will merge soon.

sergeevabc commented 7 months ago

Testing base64 0.5.2 built with w64devkit on Windows 7 x64. Oh my, the executable turned out to be almost 10 times larger than the source (116658 vs 12913 bytes).

$ busybox echo -n Z2l0aHVi | base64.exe -d
github

$ busybox echo -n Z2l0aHViIAo= | base64.exe -d
github

$ busybox echo -n Z2l0aHViCg== | base64.exe -d
github

$ busybox echo -n Z2l0aHViIA0K | base64.exe -d
github

I guess the devil is in the details like spaces, carriage returns and line feeds?

aklomp commented 7 months ago

@sergeevabc I don't quite understand your comment. Are you reporting a bug?

Those strings seem to decode fine for me, including the whitespace, as shown with xxd:

$ echo -n Z2l0aHVi | bin/base64 -d | xxd
00000000: 6769 7468 7562                           github
$ echo -n Z2l0aHViIAo= | bin/base64 -d | xxd
00000000: 6769 7468 7562 200a                      github .
$ echo -n Z2l0aHViCg== | bin/base64 -d | xxd
00000000: 6769 7468 7562 0a                        github.
$ echo -n Z2l0aHViIA0K | bin/base64 -d | xxd
00000000: 6769 7468 7562 200d 0a                   github ..

About the binary size, not sure why it's so much larger, but I think static linking might have something to do with it.

sergeevabc commented 7 months ago

There's an issue indeed, @aklomp.

aklomp commented 7 months ago

@sergeevabc This issue has been closed. Please open a new issue if you want to file a bug report. By the way, I can't reproduce this in Linux; are you sure you are using the latest version?

sergeevabc commented 7 months ago

@aklomp, by the way, you should consider displaying the version so that we don't have to guess, redownload and recompile the source code. Yes, I have the latest version (0.5.2). And I suspect that issue is related to handling CRLF.