cyrusimap / cyrus-imapd

Cyrus IMAP is an email, contacts and calendar server
http://cyrusimap.org
Other
549 stars 150 forks source link

cyr_virusscan segmentation fault with clamav-0.104.2 and cyrus-imap-3.4.2 #3873

Open akschu opened 2 years ago

akschu commented 2 years ago
(gdb) file /usr/cyrus/sbin/cyr_virusscan
Reading symbols from /usr/cyrus/sbin/cyr_virusscan...done.
(gdb) run
Starting program: /usr/cyrus/sbin/cyr_virusscan
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using ClamAV virus scanner

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff51004e7 in mpool_malloc () from /usr/lib64/libcyrus_min.so.0
(gdb) bt
#0  0x00007ffff51004e7 in mpool_malloc () from /usr/lib64/libcyrus_min.so.0
#1  0x00007ffff425ae9b in mpool_calloc (mp=<optimized out>, nmemb=nmemb@entry=15, size=size@entry=8) at /tmp/clamav-0.104.2/libclamav/mpool.c:725
#2  0x00007ffff425d06b in cl_engine_new () at /tmp/clamav-0.104.2/libclamav/others.c:486
#3  0x000000000040350a in clamav_init ()
#4  0x0000000000403282 in main ()

Let me know if there are more details that are needed.

elliefm commented 2 years ago

[I've edited your original post to format the section pasted from gdb as code, because otherwise the <optimised out> bits were being swallowed.]

It looks like you've got debugging symbols for libclamav, but not cyrus. I don't suppose you're able to get debugging symbols for cyrus too?

If I'm reading this correctly, frames 1-4 are for code in libclamav, which we don't control. It's interesting to note that mpool_calloc() (with a 'c') seems to be a libclamav thing. I would very much like to see what that mp=<optimised out> value was. nmemb=15 and size=8 seem perfectly reasonable.

Frame 0, where the segfault happens inside our mpool_malloc() (with an 'm'), we can't see the arguments for lack of debugging symbols. It takes two arguments: struct mpool *pool, size_t size.

If pool is NULL, mpool_malloc will exit with fatal, it won't segfault. So that's not what happened. If size is 0 it will treat it as 1, so that's not what happened. size_t is unsigned, so it can't be negative.

This is where cyrus debugging symbols would be very helpful, since I can't see what line of mpool_malloc() it crashed on.

It's worth observing that the cyrus lib/mpool.c file hasn't changed substantially in years, so I guess whatever changed, changed in libclamav? Whether that means they have introduced a bug, or their change has exposed a bug in cyrus, I do not know.

I have libclamav 0.103.3. Here's what I see in gdb for a similar invocation:

(gdb) file /dev/shm/cyrus/main/sbin/cyr_virusscan 
Reading symbols from /dev/shm/cyrus/main/sbin/cyr_virusscan...done.
(gdb) set args -C /dev/shm/cass/04175303113/conf/imapd.conf 
(gdb) run
Starting program: /dev/shm/cyrus/main/sbin/cyr_virusscan -C /dev/shm/cass/04175303113/conf/imapd.conf 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Using ClamAV virus scanner
Loaded 8604587 virus signatures (10.373 seconds).
fatal error: can't read mailboxes file
[Inferior 1 (process 4891) exited with code 0113]

I don't have a real cyrus deployment on this system, so it falls apart in a different way, but note that "Loaded 8604587 virus signatures...." line, which I think means clamav_init() succeeded, instead of crashing out like yours did.

akschu commented 2 years ago

Hello Ellie,

My binary seems to say that it does have debugging_info:

/usr/cyrus/sbin/cyr_virusscan: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, with debug_info, not stripped

I always build from source, so I'm happy to build it again, I'm just not sure what options I need. Right now I'm building with:

./configure --prefix=/usr --sysconfdir=/etc/mail --libdir=/usr/lib${LIBDIRSUFFIX} --with-syslogfacility=MAIL --enable-idled --enable-autocreate --disable-gssapi --without-krb --with-ldap --bindir=/usr/cyrus/bin --sbindir=/usr/cyrus/sbin --libexecdir=/usr/cyrus/libexec --with-clamav --enable-xapian

Also, I'm sure this is some issue between cyrus and clamav 0.104 as it worked for me with 0.103 as well, but since they have moved onto the new stuff for virus definitions, it would be great to get it going again.

elliefm commented 2 years ago

It's interesting that your cyr_virusscan has debugging info. I guess it's libcyrus_min.so that is missing it?

Do you set CFLAGS in the environment anywhere before running configure? I usually set it by providing it as an argument to configure, that way it gets logged in config.log. You can also just set it in the environment in any of the usual ways, but it won't be logged in config.log in that case.

I assume your CFLAGS already includes "-g", and that's why cyr_virusscan has debugging symbols... but in that case it's weird that libcyrus_min.so apparently doesn't have them. Maybe "debugging symbols for executables, but not libraries" is some system default that you haven't overridden?

If your CFLAGS doesn't include "-g", try adding it when you rebuild. I might do something like ./configure CFLAGS="$CFLAGS -g" [other configure arguments as usual]

elliefm commented 2 years ago

I don't see a Debian package for 0.104 yet, so I may have to build it from source if I want to try to reproduce this locally.

akschu commented 2 years ago

My build script looks like this:

if [ "$ARCH" = "x86_64" ]; then
  SLKCFLAGS="-O2 -fPIC"
  LIBDIRSUFFIX="64"
fi

CFLAGS="$SLKCFLAGS" \
CXXFLAGS="$SLKCFLAGS" \
LDFLAGS="-L/usr/lib${LIBDIRSUFFIX}" \
./configure --prefix=/usr --sysconfdir=/etc/mail --libdir=/usr/lib${LIBDIRSUFFIX} --with-syslogfacility=MAIL --enable-idled --enable-autocreate --disable-gssapi --without-krb --with-ldap \
  --bindir=/usr/cyrus/bin --sbindir=/usr/cyrus/sbin --libexecdir=/usr/cyrus/libexec --with-clamav --enable-xapian
make -j 4

So, I added -g to my CFLAGS and now I get this:

Reading symbols from /usr/cyrus/sbin/cyr_virusscan...done.
(gdb) run
Starting program: /usr/cyrus/sbin/cyr_virusscan
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using ClamAV virus scanner

Program received signal SIGSEGV, Segmentation fault.
mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 lib/mpool.c: No such file or directory.
(gdb) bt
#0  mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
#1  0x00007ffff44afe9b in mpool_calloc (mp=<optimized out>, nmemb=nmemb@entry=15, size=size@entry=8) at /tmp/clamav-0.104.2/libclamav/mpool.c:725
#2  0x00007ffff44b206b in cl_engine_new () at /tmp/clamav-0.104.2/libclamav/others.c:486
#3  0x00000000004034ea in clamav_init () at imap/cyr_virusscan.c:153
#4  0x0000000000403262 in main (argc=<optimized out>, argv=<optimized out>) at imap/cyr_virusscan.c:356

Not sure if that helps much as I'm terrible at C. Looks like there is a little more information.

Compiling clamav from source is pretty simple, so that might be a good way to go as this is one of those things that gets updated a lot.

Thanks for looking at this.

Matt

elliefm commented 2 years ago

That's helpful, thanks. Here's lib/mpool.c:146, where gdb thinks the segfault happens:

https://github.com/cyrusimap/cyrus-imapd/blob/81395653d1145a72190193473706866268f8bc8d/lib/mpool.c#L146

So, nothing particularly dramatic. p is dereferenced a few times, and there's some maths. p should be good -- it's the same as pool->blob because of the assignment a few lines up, and a few lines further up than that we would have fataled out if that were NULL. So it's a weird place to crash -- if this is really the source of the crash, then it smells like corruption of some kind. But I see you're building with optimisations, and that can lead to inaccurate line numbers in backtraces...

Two things:

1) If you run the same gdb session again, and then after it crashes, do:

frame 0
p p
p *p

what's reported?

2) I see you're building with optimisations (-O2 in SLKCFLAGS). If you rebuild Cyrus without optimisations, by replacing -O2 with -O0 (dash oh zero), does anything change?

3) Okay, three things -- I wonder if anything changes if you build libclamav without optimisations too. Let's see just-cyrus-without-optimisations first though; the distinction might be important.

akschu commented 2 years ago

Thank you for your help.

Here is the same session with the additional commands:

Program received signal SIGSEGV, Segmentation fault.
mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 lib/mpool.c: No such file or directory.
(gdb) bt
#0  mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
#1  0x00007ffff44afe9b in mpool_calloc (mp=<optimized out>, nmemb=nmemb@entry=15, size=size@entry=8) at /tmp/clamav-0.104.2/libclamav/mpool.c:725
#2  0x00007ffff44b206b in cl_engine_new () at /tmp/clamav-0.104.2/libclamav/others.c:486
#3  0x00000000004034ea in clamav_init () at imap/cyr_virusscan.c:153
#4  0x0000000000403262 in main (argc=<optimized out>, argv=<optimized out>) at imap/cyr_virusscan.c:356
(gdb) frame 0
#0  mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 in lib/mpool.c
(gdb) p p
$1 = (struct mpool_blob *) 0x1000
(gdb) p *p
Cannot access memory at address 0x1000

Compiling cyrus with -O0 produces the same result:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 lib/mpool.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
#1  0x00007ffff445ae9b in mpool_calloc (mp=<optimized out>, nmemb=nmemb@entry=15, size=size@entry=8) at /tmp/clamav-0.104.2/libclamav/mpool.c:725
#2  0x00007ffff445d06b in cl_engine_new () at /tmp/clamav-0.104.2/libclamav/others.c:486
#3  0x0000000000402acf in clamav_init () at imap/cyr_virusscan.c:153
#4  0x0000000000403088 in main (argc=1, argv=0x7fffffffe3c8) at imap/cyr_virusscan.c:356
(gdb) frame 0
#0  0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 in lib/mpool.c
(gdb) p p
$1 = (struct mpool_blob *) 0x1000
(gdb) p *p
Cannot access memory at address 0x1000

Compiling clamav with -O0 also produces the same results:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 lib/mpool.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
#1  0x00007ffff445ae9b in mpool_calloc (mp=<optimized out>, nmemb=nmemb@entry=15, size=size@entry=8) at /tmp/clamav-0.104.2/libclamav/mpool.c:725
#2  0x00007ffff445d06b in cl_engine_new () at /tmp/clamav-0.104.2/libclamav/others.c:486
#3  0x0000000000402acf in clamav_init () at imap/cyr_virusscan.c:153
#4  0x0000000000403088 in main (argc=1, argv=0x7fffffffe3c8) at imap/cyr_virusscan.c:356
(gdb) frame 0
#0  0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f7b000, size=120) at lib/mpool.c:146
146 in lib/mpool.c
(gdb) p p
$1 = (struct mpool_blob *) 0x1000
(gdb) p *p
Cannot access memory at address 0x1000

Thank again for working on this!

akschu commented 2 years ago

I spent a little more time on this and built clamav and cyrus-imap on RC3 of slackware 15 (I was on slackware 14.2) which is basically switching compilers to gcc-11.2.0/gdb-11.2 and it didn't seem to make any difference:

Starting program: /usr/cyrus/sbin/cyr_virusscan
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Using ClamAV virus scanner

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5c5b350 in mpool_malloc (pool=0x7ffff4858000, size=120) at lib/mpool.c:146
146 lib/mpool.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5c5b350 in mpool_malloc (pool=0x7ffff4858000, size=120) at lib/mpool.c:146
#1  0x00007ffff560b67e in mpool_calloc (mp=0x7ffff4858000, nmemb=15, size=8) at /tmp/clamav-0.104.2/libclamav/mpool.c:725
#2  0x00007ffff560c6be in cl_engine_new () at /tmp/clamav-0.104.2/libclamav/others.c:486
#3  0x00000000004037e1 in clamav_init () at imap/cyr_virusscan.c:153
#4  0x0000000000403dda in main (argc=1, argv=0x7fffffffe5e8) at imap/cyr_virusscan.c:356
(gdb) frame 0
#0  0x00007ffff5c5b350 in mpool_malloc (pool=0x7ffff4858000, size=120) at lib/mpool.c:146
146 in lib/mpool.c
(gdb) p p
$1 = (struct mpool_blob *) 0x1000
(gdb) p *p
Cannot access memory at address 0x1000
(gdb) quit
A debugging session is active.
elliefm commented 2 years ago

Thanks for that. 0x1000 is a really strange value for a pointer to have. Given that, "Cannot access memory at address 0x1000" is about what I'd expect, and explains the segfault. Why is it 0x1000 though?

Do you mind doing the same, but instead of p p and p *p, do p pool and p *pool?

I had a look at how to build my own libclamav a few days ago to see if I could try to reproduce this locally. It looks like I can't do "just" libclamav, but I'll need to build and install the whole of ClamAV, which also means I'll have to backup and remove what I already have so they don't get in each others' way. I don't know enough about ClamAV to confidently just do that -- I have and use the Debian package so that I don't have to know about it. I'll keep looking into it, because it would help a lot if i could repro this locally, but frustratingly it's not quite as simple as "just build the new version of the library from source and link Cyrus against it"

akschu commented 2 years ago

Here is the debug output

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f75000, size=120) at lib/mpool.c:146
146 lib/mpool.c: No such file or directory.
(gdb) bt
#0  0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f75000, size=120) at lib/mpool.c:146
#1  0x00007ffff4364d87 in mpool_calloc () from /usr/lib64/libclamav.so.9
#2  0x00007ffff4365d67 in cl_engine_new () from /usr/lib64/libclamav.so.9
#3  0x0000000000402acf in clamav_init () at imap/cyr_virusscan.c:153
#4  0x0000000000403088 in main (argc=1, argv=0x7fffffffe218) at imap/cyr_virusscan.c:356
(gdb) frame 0
#0  0x00007ffff5301c8d in mpool_malloc (pool=0x7ffff7f75000, size=120) at lib/mpool.c:146
146 in lib/mpool.c
(gdb) p *pool
$1 = {blob = 0x1000}
(gdb) p pool
$2 = (struct mpool *) 0x7ffff7f75000

Here is my build script for clamav. I use slackware, so everything is built with a build script called a SlackBuild and a slackware package is nothing more than a tarball extracted from root.

#!/bin/sh

PRGNAM=clamav
VERSION=0.104.2
TARBALL=$PRGNAM-$VERSION.tar.gz
ARCH=${ARCH:-x86_64}
BUILD=${BUILD:-2}
CWD=$(pwd)
TMP=${TMP:-/tmp}
PKG=$TMP/package-$PRGNAM
SRC=$TMP/$PRGNAM-$VERSION

# make sure everything is fresh
rm -rf $PKG $SRC

# extract
cd $TMP
tar xzvf $CWD/$TARBALL
cd $SRC

LIBDIRSUFFIX=64

# compile and install
mkdir build; cd build
LDFLAGS="-L/usr/lib${LIBDIRSUFFIX}" \
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib${LIBDIRSUFFIX} -DAPP_CONFIG_DIRECTORY=/etc/mail -DENABLE_JSON_SHARED=OFF ..
make all install DESTDIR=$PKG

# docs
mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION $PKG/etc/rc.d/
cp -a $SRC/[A-Z][A-Z][A-Z]* $PKG/usr/doc/$PRGNAM-$VERSION

# make the package
cd $PKG
mkdir install
cp $CWD/slack-desc install

# copy configs
mv etc/mail/clamd.conf etc/mail/clamd.conf.default
mv etc/mail/freshclam.conf etc/mail/freshclam.conf.default
cp $CWD/rc.clamd $PKG/etc/rc.d/

# delete defs
rm -rf usr/share/clamav/*

if [ `id -u` != 0 ]; then
  echo "Skipping makepkg since you are not root";
else
  /sbin/makepkg -l y -c n -p $CWD/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.tgz
fi

# clean up
rm -rf $PKG $SRC

If it helps, I could deploy an instance in the cloud somewhere and provide you access. It would already have all of the compilers and build scripts to make this pretty simple for you to work on.

elliefm commented 2 years ago

If it helps, I could deploy an instance in the cloud somewhere and provide you access. It would already have all of the compilers and build scripts to make this pretty simple for you to work on.

That would be really helpful, thanks! I won't quite have time to properly focus on this until I've gotten the first 3.6 beta out, so don't set it up just yet, but let's talk once the beta is out.

akschu commented 2 years ago

Sure, just email me an ssh public key to schu@<same as user part>.net and I'll get it setup.