This bug was found on Ubuntu 20.04 64-bit & cmix was checked out from master branch of GitHub repository. Its commit is 6deea578f41a6206bee9cb112fc843bac5f7980f (Sun Mar 17 08:58:25 2024 -0700).
cmix was built with ASAN using clang-14. The compile command was:
Notably, all the POCs in the zip archive trigger the same bug, and surprisingly, some POCs are unable to trigger this bug consistently and require multiple attempts.
Bug Analysis
The memcpy statement in paq8.cpp:2862 is as follows.
The dest memory region might overlap with the src memory region with &W->Letters[i+1] <= &W->Letters[i+2] < &W->Letters[i+1] + MAX_WORD_SIZE-i-2, which results in unexpected behavior of memcpy.
Maybe this bug could be simply fixed by replacing memcpy with memmove.
GDB says
Breakpoint 1, paq8::GermanStemmer::ReplaceSharpS (this=0x5632d50, W=0x91aca80) at src/models/paq8.cpp:2862
warning: Source file is more recent than executable.
2862 memcpy(&W->Letters[i+2], &W->Letters[i+1], MAX_WORD_SIZE-i-2);
(gdb) p &W->Letters[i+2]
$1 = (paq8::U8 *) 0x91aca82 "is"
(gdb) p &W->Letters[i+2] + MAX_WORD_SIZE-i-2
No symbol "MAX_WORD_SIZE" in current context.
(gdb) p &W->Letters[i+2] + 64-i-2
$2 = (paq8::U8 *) 0x91acac0 ""
(gdb) p &W->Letters[i+1]
$3 = (paq8::U8 *) 0x91aca81 "his"
(gdb) p &W->Letters[i+1] + 64-i-2
$4 = (paq8::U8 *) 0x91acabf ""
(gdb) p &W->Letters[i+2] - &W->Letters[i+1]
$5 = 1
Obviously, there is a memory overlap in this memcpy.
ASAN says
==1867601==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x619000056c52,0x619000056c90) and [0x619000056c51, 0x619000056c8f) overlap
#0 0x4c4179 in __asan_memcpy /llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3
#1 0x6209b9 in paq8::GermanStemmer::ReplaceSharpS(paq8::Word*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:2862:11
#2 0x620025 in paq8::GermanStemmer::Stem(paq8::Word*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:2977:5
#3 0x598c07 in paq8::TextModel::Update(paq8::Buf&, paq8::ModelStats*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:3258:28
#4 0x60690e in paq8::TextModel::Predict(paq8::Mixer&, paq8::Buf&, paq8::ModelStats*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:3160:7
#5 0x5e8b70 in paq8::contextModel2(paq8::ModelStats*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8183:13
#6 0x5e9f26 in paq8::Predictor::update() /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8277:11
#7 0x5ed309 in PAQ8::Perceive(int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8383:15
#8 0x65ea86 in Predictor::Perceive(int) /data/symccgo/bug/cmix/cmix/src/predictor.cpp:416:12
#9 0x4ffd8e in Encoder::Encode(int) /data/symccgo/bug/cmix/cmix/src/coder/encoder.cpp:23:7
#10 0x6a3776 in Compress(unsigned long long, std::basic_ifstream<char, std::char_traits<char> >*, std::basic_ofstream<char, std::char_traits<char> >*, unsigned long long*, Predictor*) /data/symccgo/bug/cmix/cmix/src/runner.cpp:106:9
#11 0x6a4c30 in RunCompression(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _IO_FILE*, unsigned long long*, unsigned long long*) /data/symccgo/bug/cmix/cmix/src/runner.cpp:203:3
#12 0x6a69d4 in main /data/symccgo/bug/cmix/cmix/src/runner.cpp:298:10
#13 0x7f8fc5028082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
#14 0x41fb4d in _start (/data/symccgo/bug/cmix/obj-asan-dbg/cmix+0x41fb4d)
0x619000056c52 is located 722 bytes inside of 960-byte region [0x619000056980,0x619000056d40)
allocated by thread T0 here:
#0 0x4c5877 in calloc /llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x62bb1e in paq8::Array<paq8::Word, 0>::create(unsigned int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:118:16
#2 0x5f1547 in paq8::Array<paq8::Word, 0>::Array(unsigned int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:76:28
#3 0x6177aa in paq8::Cache<paq8::Word, 8u>::Cache() /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:3013:22
#4 0x6000c5 in paq8::TextModel::TextModel(unsigned int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:3138:3
#5 0x5e7b70 in paq8::contextModel2(paq8::ModelStats*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8104:20
#6 0x5e9f26 in paq8::Predictor::update() /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8277:11
#7 0x5ed309 in PAQ8::Perceive(int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8383:15
#8 0x65ea86 in Predictor::Perceive(int) /data/symccgo/bug/cmix/cmix/src/predictor.cpp:416:12
#9 0x4ffd8e in Encoder::Encode(int) /data/symccgo/bug/cmix/cmix/src/coder/encoder.cpp:23:7
#10 0x6a3776 in Compress(unsigned long long, std::basic_ifstream<char, std::char_traits<char> >*, std::basic_ofstream<char, std::char_traits<char> >*, unsigned long long*, Predictor*) /data/symccgo/bug/cmix/cmix/src/runner.cpp:106:9
#11 0x6a4c30 in RunCompression(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _IO_FILE*, unsigned long long*, unsigned long long*) /data/symccgo/bug/cmix/cmix/src/runner.cpp:203:3
#12 0x6a69d4 in main /data/symccgo/bug/cmix/cmix/src/runner.cpp:298:10
#13 0x7f8fc5028082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
0x619000056c51 is located 721 bytes inside of 960-byte region [0x619000056980,0x619000056d40)
allocated by thread T0 here:
#0 0x4c5877 in calloc /llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
#1 0x62bb1e in paq8::Array<paq8::Word, 0>::create(unsigned int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:118:16
#2 0x5f1547 in paq8::Array<paq8::Word, 0>::Array(unsigned int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:76:28
#3 0x6177aa in paq8::Cache<paq8::Word, 8u>::Cache() /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:3013:22
#4 0x6000c5 in paq8::TextModel::TextModel(unsigned int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:3138:3
#5 0x5e7b70 in paq8::contextModel2(paq8::ModelStats*) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8104:20
#6 0x5e9f26 in paq8::Predictor::update() /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8277:11
#7 0x5ed309 in PAQ8::Perceive(int) /data/symccgo/bug/cmix/cmix/src/models/paq8.cpp:8383:15
#8 0x65ea86 in Predictor::Perceive(int) /data/symccgo/bug/cmix/cmix/src/predictor.cpp:416:12
#9 0x4ffd8e in Encoder::Encode(int) /data/symccgo/bug/cmix/cmix/src/coder/encoder.cpp:23:7
#10 0x6a3776 in Compress(unsigned long long, std::basic_ifstream<char, std::char_traits<char> >*, std::basic_ofstream<char, std::char_traits<char> >*, unsigned long long*, Predictor*) /data/symccgo/bug/cmix/cmix/src/runner.cpp:106:9
#11 0x6a4c30 in RunCompression(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _IO_FILE*, unsigned long long*, unsigned long long*) /data/symccgo/bug/cmix/cmix/src/runner.cpp:203:3
#12 0x6a69d4 in main /data/symccgo/bug/cmix/cmix/src/runner.cpp:298:10
#13 0x7f8fc5028082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
SUMMARY: AddressSanitizer: memcpy-param-overlap /llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3 in __asan_memcpy
==1867601==ABORTING
Dear All,
This bug was found on Ubuntu 20.04 64-bit & cmix was checked out from master branch of GitHub repository. Its commit is 6deea578f41a6206bee9cb112fc843bac5f7980f (Sun Mar 17 08:58:25 2024 -0700).
cmix was built with ASAN using clang-14. The compile command was:
To reproduce: Download and unzip the attached zip archive, and get POCs
Notably, all the POCs in the zip archive trigger the same bug, and surprisingly, some POCs are unable to trigger this bug consistently and require multiple attempts.
Bug Analysis
The
memcpy
statement in paq8.cpp:2862 is as follows.The dest memory region might overlap with the src memory region with
&W->Letters[i+1] <= &W->Letters[i+2] < &W->Letters[i+1] + MAX_WORD_SIZE-i-2
, which results in unexpected behavior ofmemcpy
.Maybe this bug could be simply fixed by replacing
memcpy
withmemmove
.GDB says
Obviously, there is a memory overlap in this
memcpy
.ASAN says
POC
attached zip archive