ThomasDickey / original-mawk

bug-reports for mawk (originally on GoogleCode)
http://invisible-island.net/mawk/mawk.html
17 stars 2 forks source link

buffer overread in scan.c with `mawk -f <large file>` #77

Closed jcowgill closed 2 weeks ago

jcowgill commented 3 months ago

If you compile mawk with clang and CFLAGS=-fsanitize=memory LDFLAGS=-fsanitize=memory, then mawk errors out when you do this:

# Create an 8K file filled with newlines
$ yes '' | head -c 8192 > test-file.awk
$ ./mawk -f ./test-file.awk
==1809148==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5627a6c1ab64 in eat_nl /tmp/mawk-snapshots/scan.c:283:28
    #1 0x5627a6c1a3d7 in scan_init /tmp/mawk-snapshots/scan.c:141:5
    #2 0x5627a6c58230 in process_cmdline /tmp/mawk-snapshots/init.c:678:2
    #3 0x5627a6c55978 in initialize /tmp/mawk-snapshots/init.c:757:5
    #4 0x5627a6c2967d in main /tmp/mawk-snapshots/main.c:47:5
    #5 0x7fd2dfc016c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #6 0x7fd2dfc01784 in __libc_start_main csu/../csu/libc-start.c:360:3
    #7 0x5627a6b79460 in _start (/tmp/mawk-snapshots/mawk+0x1f460) (BuildId: eae046c11f8fd6dfba6206353fdca87929bcc692)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /tmp/mawk-snapshots/scan.c:283:28 in eat_nl
Exiting

I think this started happening after the fix for https://github.com/ThomasDickey/original-mawk/issues/71 went in. In the code path for the -f option involving scan_init and scan_fillbuff, nobody clears the final buffer byte to zero anymore for input files over 4096 bytes. The null terminator is only added for the final 4k chunk.

This change fixes this bug for me:

--- a/scan.c
+++ b/scan.c
@@ -213,6 +213,8 @@ scan_fillbuff(void)
        /* make sure eof is terminated */
        buffer[r] = '\n';
        buffer[r + 1] = 0;
+    } else {
+       buffer[r] = 0;
     }
 }

I originally ran into this trying to compile ncurses and getting some very strange mawk errors. It turns out I had GLIBC_TUNABLES=glibc.malloc.perturb=42 in my environment which was causing glibc malloc to randomize its returned memory and triggering this bug. You might also be able to reproduce something with that set.

I am using 20240123 on Debian unstable.

ThomasDickey commented 3 months ago

thanks - I can reproduce the issue with valgrind, and with that change don't see any problem with that testcase or my usual regression tests.

ThomasDickey commented 2 weeks ago

https://invisible-island.net/mawk/CHANGES.html#t20240622