Open gbulfon opened 2 years ago
Could you please provide a backtrace of the core dump?
Here it is:
ffffdf7fffdfea00 libc.so.1`_lwp_kill+0xa()
ffffdf7fffdfea30 libc.so.1`raise+0x20(6)
ffffdf7fffdfea80 libc.so.1`abort+0x98()
ffffdf7fffdfead0 0xffffdf7fc4a5f7a3()
ffffdf7fffdfeb00 libstdc++.so.6.0.13`_ZN10__cxxabiv111__terminateEPFvvE+0x15()
ffffdf7fffdfeb10 libstdc++.so.6.0.13`_ZN10__cxxabiv112__unexpectedEPFvvE()
ffffdf7fffdfeb40 0xffffdf7fc426e9a1()
ffffdf7fffdfed70 libxapian-1.5.so.0.0.0`_ZN12GlassVersion4readEv+0x25c()
ffffdf7fffdfedb0 libxapian-1.5.so.0.0.0`_ZN13GlassDatabase11open_tablesEi+0x56()
ffffdf7fffdfee30 libxapian-1.5.so.0.0.0`_ZN13GlassDatabaseC1ERKSsij+0x450()
ffffdf7fffdfeea0 libxapian-1.5.so.0.0.0`_ZN21GlassWritableDatabaseC1ERKSsii+0x2f()
ffffdf7fffdff0a0 libxapian-1.5.so.0.0.0`_ZN6Xapian16WritableDatabaseC1ERKSsii+0x69c()
ffffdf7fffdff1f0 libcyrus_imap.so.0.0.0`xapian_dbw_open+0xd3()
ffffdf7fffdff230 libcyrus_imap.so.0.0.0`end_message_update+0x7c()
ffffdf7fffdff350 libcyrus_imap.so.0.0.0`index_getsearchtext+0x801()
ffffdf7fffdff3b0 libcyrus_imap.so.0.0.0`flush_batch+0xf8()
ffffdf7fffdff430 libcyrus_imap.so.0.0.0`search_update_mailbox+0x1d3()
ffffdf7fffdff8a0 index_one+0x463()
ffffdf7fffdff8d0 do_indexer+0x7e()
ffffdf7fffdff990 main+0x776()
ffffdf7fffdff9a0 _start+0x6c()
Using truss on the process (same as strace on solaris) I can see it tries to open the iamglass file but fails as non existant.
Is this enough or do you need some more info? Maybe I can build a version of Xapian libs with debug info? Actually I think the only issue here is why that iamglass file is not created.
Sorry, I looked at the core dump but then got distracted by other stuff. I don't think it's necessary to get debug info. Your xapian directory also is missing the position.glass
postlist.glass
and termlist.glass
indexes, which are where the Xapian index lives. You are running Xapian in the right version (the glass backend got default in Xapian 1.4). Did you start your index completely from scratch, e.g. has the search partition exists before? Or did the directory already exist?
Thanks! I started from scratch, moving from "squatter" indexing, created the directory from scratch as stated in configuration:
t1searchpartition-default: /sonicle/var/cyrus/search
Do you mean that even this root directory should not be created the first time and will be created by Xapian?
I am bit at loss here. The terminate called after throwing an instance of 'Xapian::DatabaseNotFoundError
error message comes from the stdlib C++ exception handler for uncatched exceptions. But the call trace shows that xapian_dbw_open
in the Cyrus source is executed and this does catch any Xapian::DatabaseOpeningError
, of which Xapian::DatabaseNotFoundError
inherits from.
What setup are you running this on, e.g. what is your operating system, which compiler and standard libraries?
It's XStreamOS/illumos , our own distro. The xapian component is built on gcc 4.9. Here's an ldd of dependencies:
sonicle@www:/$ ldd /sonicle/lib/libxapian-1.5.so
libz.so.1 => /usr/lib/libz.so.1
libuuid.so.1 => /lib/libuuid.so.1
libnsl.so.1 => /lib/libnsl.so.1
libsocket.so.1 => /lib/libsocket.so.1
libicuuc.so.58 => /usr/lib/libicuuc.so.58
libstdc++.so.6 => /usr/lib/libstdc++.so.6
libm.so.2 => /lib/libm.so.2
librt.so.1 => /lib/librt.so.1
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.1 => /lib/libc.so.1
libdlpi.so.1 => /lib/libdlpi.so.1
libmp.so.2 => /lib/libmp.so.2
libmd.so.1 => /lib/libmd.so.1
libicudata.so.58 => /usr/lib/libicudata.so.58
libpthread.so.1 => /lib/libpthread.so.1
libinetutil.so.1 => /lib/libinetutil.so.1
libdladm.so.1 => /lib/libdladm.so.1
libdevinfo.so.1 => /lib/libdevinfo.so.1
libscf.so.1 => /lib/libscf.so.1
librcm.so.1 => /lib/librcm.so.1
libnvpair.so.1 => /lib/libnvpair.so.1
libexacct.so.1 => /usr/lib/libexacct.so.1
libkstat.so.1 => /lib/libkstat.so.1
libcurses.so.1 => /lib/libcurses.so.1
libpool.so.1 => /usr/lib/libpool.so.1
libsec.so.1 => /lib/libsec.so.1
libgen.so.1 => /lib/libgen.so.1
libuutil.so.1 => /lib/libuutil.so.1
libsmbios.so.1 => /usr/lib/libsmbios.so.1
libxml2.so.2 => /usr/lib/libxml2.so.2
libavl.so.1 => /lib/libavl.so.1
libidmap.so.1 => /usr/lib/libidmap.so.1
liblzma.so.5 => /usr/lib/liblzma.so.5
Would you have a chance to test this on a setup with a more recent gcc? Our reference compiler is the same as Debian stable, which currently is 8.3.0
This is hard, as the machine cannot be upgraded to latest XStreamOS (which features gcc up to 9). I need a version that can be delivered to non updated systems, with gcc up to 4.9. I may build and install the gcc 8 package for these systems, but I would end up with cyrus built on 4.9 and xapian on 8, and who knows...it would be a hard work to rebuild all cyrus and xapian on 8.
Is there anything I can do to certify that this is the reason of the problem, before deciding to go gcc8 on older machines?
If you built Xapian from source, then you could make share that Xapian does what's is supposed to by running the Xapian unit tests: make check
in the xapian-core
source directory will do that.
Also, you could take Cyrus out of the loop by writing a small C++ executable that replicates essentially what Cyrus attempts to do in that codepath, something like:
try {
int flags = Xapian::DB_BACKEND_GLASS|Xapian::DB_RETRY_LOCK;
Xapian::WritableDatabase* db = 0;
try {
db = new Xapian::WritableDatabase{thispath, flags|Xapian::DB_OPEN};
} catch (Xapian::DatabaseOpeningError &e) {
db = new Xapian::WritableDatabase{"your-xapian-path-here", flags|Xapian::DB_CREATE};
}
assert(db);
destroy(db);
}
catch (const Xapian::DatabaseLockError &err) {
// log this error
}
catch (const Xapian::Error &err) {
// log this error
}
Thanks, I'll take my time to do these tests!
sonicle@www:~/xapian$ g++ -I /sonicle/include/xapian-1.5 -std=c++11 -L/sonicle/lib -lxapian-1.5 -o test test.cc
sonicle@www:~/xapian$ ./test
sonicle@www:~/xapian$ ls /sonicle/var/cyrus/test/
flintlock iamglass postlist.glass termlist.glass
on the stack I sent, I cannot evince if the error is thrown during DB_OPEN (so no catch is done) or DB_CREATE, which is the only one that may throw exceptions AFAIK, but why?
I wrote a test that attempts to replicate the situation you are describing: it first sets up the search index path, then removes the Xapian glass files from there, then attempts to index. I ran this on my build, Debian Buster, gcc 8.3.0 and I can assert that both the test passes and the implementation correctly catches the Xapian::DatabaseOpeningError
, followed by creating new Xapian glass index files.
sub test_ghissue_3849
:min_version_3_4
{
my ($self) = @_;
my $jmap = $self->{jmap};
my $xapdirs = ($self->{instance}->run_mbpath(-u => 'cassandane'))->{xapian};
$self->make_message('test1');
$self->assert(not -e $xapdirs->{t1} . '/xapian/iamglass');
$self->{instance}->run_command({cyrus => 1}, 'squatter');
$self->assert(-e $xapdirs->{t1} . '/xapian/iamglass');
unlink($xapdirs->{t1} . '/xapian/iamglass');
unlink($xapdirs->{t1} . '/xapian/position.glass');
unlink($xapdirs->{t1} . '/xapian/postlist.glass');
unlink($xapdirs->{t1} . '/xapian/termlist.glass');
$self->make_message('test2');
$self->assert(not -e $xapdirs->{t1} . '/xapian/iamglass');
$self->{instance}->run_command({cyrus => 1}, 'squatter', '-i');
$self->assert(-e $xapdirs->{t1} . '/xapian/iamglass');
}
The question remains why the exception handler seems not catch this situation on your build and rather pass the thrown exception to the default handler.
Or maybe the problem arises during the DB_CREATE call, and for some reason it cannot be created. Can you help me modify correctly the xapian_dbw_open code to catch the second WritableDatabase (DB_CREATE) and output any information about the exception?
I've seen that the glass_database.cc code uses LOGCALL_CTOR to log debug info in the GlassWritableDatabase constructor. How do I enable this Xapian debug and where will it log infos?
Found Xapian docs to enable and setup logging. I'm building with --enable-log and see what happens.
Enabling --debug-log the build fails... :(
libtool: compile: g++ -DHAVE_CONFIG_H -I. -I./common -I./include -I/sonicle/include -Wall -W -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat-security -fno-gnu-keywords -Wundef -Woverloaded-virtual -Wstrict-null-sentinel -Wshadow -Wstrict-overflow=1 -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Werror -fvisibility=hidden -fvisibility-inlines-hidden -mfpmath=sse -msse2 -mtune=generic -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/sonicle/include -std=c++11 -MT cluster/cluster.lo -MD -MP -MF cluster/.deps/cluster.Tpo -c cluster/cluster.cc -fPIC -DPIC -o cluster/.libs/cluster.o
In file included from ./common/debuglog.h:36:0,
from cluster/cluster.cc:30:
./common/pretty.h: In instantiation of 'PrettyOStream<S>& operator<<(PrettyOStream<S>&, const T&) [with S = std::basic_ostringstream<char>; T = Xapian::FreqSource]':
cluster/cluster.cc:295:5: required from here
./common/pretty.h:58:11: error: cannot bind 'std::basic_ostream<char>' lvalue to 'std::basic_ostream<char>&&'
ps.os << t;
^
In file included from /usr/gcc/4.9/include/c++/4.9.4/iterator:64:0,
from ./include/xapian/mset.h:29,
from ./include/xapian/cluster.h:32,
from cluster/cluster.cc:25:
/usr/gcc/4.9/include/c++/4.9.4/ostream:602:5: note: initializing argument 1 of 'std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&&, const _Tp&) [with _CharT = char; _Traits = std::char_traits<char>; _Tp = Xapian::FreqSource]'
operator<<(basic_ostream<_CharT, _Traits>&& __os, const _Tp& __x)
^
Maybe some build flags?
this is what could help in debugging xapian_dbw_open
diff --git a/imap/xapian_wrap.cpp b/imap/xapian_wrap.cpp
index aa918897a3..45bb92b36d 100644
--- a/imap/xapian_wrap.cpp
+++ b/imap/xapian_wrap.cpp
@@ -758,6 +758,7 @@ EXPORTED int xapian_dbw_open(const char **paths, xapian_dbw_t **dbwp,
} catch (Xapian::DatabaseOpeningError &e) {
/* It's OK not to atomically create or open, since we can assume
* the xapianactive file items to be locked. */
+ syslog(LOG_ERR, "%s:%d: exception=<%s> path=<%s>", __func__, __LINE__, e.get_description().c_str(), thispath);
dbw->database = new Xapian::WritableDatabase{thispath, flags|Xapian::DB_CREATE};
}
if (db_versions.find(XAPIAN_DB_CURRENT_VERSION) == db_versions.end()) {
@@ -770,6 +771,7 @@ EXPORTED int xapian_dbw_open(const char **paths, xapian_dbw_t **dbwp,
}
catch (const Xapian::DatabaseLockError &err) {
+ syslog(LOG_ERR, "%s:%d: exception=<%s> path=<%s>", __func__, __LINE__, err.get_description().c_str(), thispath);
/* somebody else is already indexing this user. They may be doing a different
* mailbox, so we need to re-insert this mailbox into the queue! */
r = IMAP_MAILBOX_LOCKED;
For the xapian build error it's better to ask on the Xapian mailing list (which could be informative on what minimum gcc version the Xapian build expects)
Tried: I get no log anywhere, just squatter log in imapd.log about indexing, stopping at admin (the first one with content):
Jan 5 14:27:34 www squatter[17783]: [ID 621814 local6.notice] indexing mailboxes
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Drafts...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Out...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Sent...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Spam...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Trash...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Archive@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Drafts@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Sent@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Spam@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox Trash@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox user/TEST1@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 244414 local6.info] Building directory /sonicle/var/cyrus/search/domain/s/sonicle.com/t/user/TEST1/xapian
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox user/TEST2@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 244414 local6.info] Building directory /sonicle/var/cyrus/search/domain/s/sonicle.com/t/user/TEST2/xapian
Jan 5 14:27:34 www squatter[17783]: [ID 793630 local6.info] indexing mailbox user/admin@sonicle.com...
Jan 5 14:27:34 www squatter[17783]: [ID 244414 local6.info] Building directory /sonicle/var/cyrus/search/domain/s/sonicle.com/a/user/admin/xapian
Jan 5 14:27:34 www squatter[17783]: [ID 275131 local6.notice] skiplist: recovered /sonicle/var/imap/domain/s/sonicle.com/user/a/admin.conversations (6 records, 520 bytes) in 0 seconds
Jan 5 14:27:34 www squatter[17783]: [ID 229495 local6.info] skiplist: checkpointed /sonicle/var/imap/domain/s/sonicle.com/user/a/admin.conversations (6 records, 520 bytes) in 0.016 sec
Jan 5 14:29:16 www squatter[17942]: [ID 621814 local6.notice] indexing mailboxes
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Drafts...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Out...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Sent...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Spam...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Trash...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Archive@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Drafts@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Sent@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Spam@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox Trash@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox user/TEST1@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox user/TEST2@sonicle.com...
Jan 5 14:29:16 www squatter[17942]: [ID 793630 local6.info] indexing mailbox user/admin@sonicle.com...
BTW, Xapian docs say gcc 4.7 is minimum requirement:
https://fossies.org/linux/xapian-core/INSTALL
I'm stuck...
Another possibility is that the first exception (DB_OPEN) is not of the type catched. Is there any way in C++ to catch a generic Exception (as you do in Java) and get some sort of output about the type thrown?
Meanwhile I'm building gcc 8 for those systems and see if building both cyrus and xapian will solve.
the squatter run in the very first comment of this issue tells that it is a Xapian::DatabaseNotFoundError
Oh yes...sure... let's see what happen with gcc 8.
Could update gcc only to 5.5.0 (I have issues building 6 or more). Still the same problem... I have no more ideas now...
I tried rebuilding cyrus with "-g" to see if the exc was actually thrown during DB_OPEN and not inside the catch DB_CREATE. It is thrown during DB_OPEN, so for some reason the exception is not caught.
So I tried changing the catch into a generic "catch(...)". Don't know if this may shed some light, but this is the new result:
terminate called after throwing an instance of 'std::out_of_range'
Interesting! Can you pinpoint the exact line in the Xapian backtrage where that happens?
I have only this, no "-g" :
Loading modules: [ libc.so.1 ld.so.1 ]
> $C
ffffdf7fffdfe910 libc.so.1`_lwp_kill+0xa()
ffffdf7fffdfe940 libc.so.1`raise+0x20(6)
ffffdf7fffdfe990 libc.so.1`abort+0x98()
ffffdf7fffdfe9e0 0xffffdf7fc4a5f7a3()
ffffdf7fffdfea10 libstdc++.so.6.0.13`_ZN10__cxxabiv111__terminateEPFvvE+0x15()
ffffdf7fffdfea20 libstdc++.so.6.0.13`_ZN10__cxxabiv112__unexpectedEPFvvE()
ffffdf7fffdfea60 libstdc++.so.6.0.13`__cxa_rethrow()
ffffdf7fffdfeab0 libstdc++.so.6.0.20`_ZSt20__throw_out_of_rangePKc+0x62()
ffffdf7fffdfeb10
libcyrus_imap.so.0.0.0`_ZNSt3mapIKSsSt10unique_ptrIN6Xapian7StopperESt14default_deleteIS3_EESt4lessIS0_ESaISt4pairIS0_S6_EEE2atERS0_+0xb9(
)
ffffdf7fffdff030 libcyrus_imap.so.0.0.0`_ZL11get_stopperRKSs+0x2d()
ffffdf7fffdff070 libcyrus_imap.so.0.0.0`_ZL15xapian_dbw_initP10xapian_dbw+0x76()
ffffdf7fffdff1c0 libcyrus_imap.so.0.0.0`xapian_dbw_open+0x1e1()
ffffdf7fffdff240 libcyrus_imap.so.0.0.0`end_message_update+0x31f()
ffffdf7fffdff390 libcyrus_imap.so.0.0.0`index_getsearchtext+0x56b()
ffffdf7fffdff410 libcyrus_imap.so.0.0.0`search_update_mailbox+0x408()
ffffdf7fffdff890 index_one.constprop.1+0x32e()
ffffdf7fffdff980 main+0x1101()
ffffdf7fffdff990 _start+0x6c()
This just seems to have shifted the problem. I guess that out_of_range exceptions is thrown here: https://github.com/cyrusimap/cyrus-imapd/blob/master/imap/xapian_wrap.cpp#L274
Which means there is again an exception that for whatever reason does not seem to get catched on your build. I have never seen this situation before, but my C++ is rather rudimentary.
Reading here it looks like this may happen when you mix gcc/g++ of different versions:
https://stackoverflow.com/questions/45895622/g-cant-catch-an-exception/45921484
I'm pretty sure both cyrus and xapian are built with the same gcc/g++ , though...
I think the only way to go is to be able to build gcc 6/7/8 for those systems and build both cyrus and xapian with 8.
I found a second issue that I beleive is still related to the old gcc version: if I build cyrus with any optimization option (O, O2 or O3), squatter (in squatter mode, no xapian) will create bad index files, crashing imap daemons on any search! If I stick to no optimization or -g it will work fine...
if I build cyrus with any optimization option (O, O2 or O3), squatter (in squatter mode, no xapian) will create bad index files, crashing imap daemons on any search! If I stick to no optimization or -g it will work fine...
Interesting, I wonder if that's related to the new warnings gcc 8.3 reports when optimisations are enabled -- #3854
Interesting, I wonder if that's related to the new warnings gcc 8.3 reports when optimisations are enabled -- #3854
This is the stack of the core file produced by imapd when crashing because of optimizations:
ffffdf7fffdfa700 libcyrus.so.0.0.0charset_convert+0xdb() ffffdf7fffdfa750 libcyrus_imap.so.0.0.0
match+0xce()
ffffdf7fffdfa7d0 libcyrus_imap.so.0.0.0subquery_run_indexed+0x219() ffffdf7fffdfa810 libcyrus_min.so.0.0.0
hash_enumerate+0x51()
ffffdf7fffdfa860 libcyrus_imap.so.0.0.0search_query_run+0x16b() ffffdf7fffdfa8c0 libcyrus_imap.so.0.0.0
index_search+0x69()
ffffdf7fffdfceb0 cmdloop+0x3451()
ffffdf7fffdfcf00 service_main+0x206()
ffffdf7fffdff7a0 main+0x7b8()
ffffdf7fffdff7b0 _start+0x6c()
the imapd daemons dies whenever a search is issued and a cyrus.squat file is present in that folder.
I could build gcc65 and gcc83, so I tried building both xapian and cyrus-imapd with both, but still the same problem, exception is not caught... :( BTW, both with gcc65 and gcc83 and both with or without any optimization, I get a crash on imapd once squatter (normal version, not xapian) reindex a folder. Same stack as above...
I had to temoprarily get back to the version built with gcc 5 without optimization to have the server running with normal squatter.
I think I found the reason to the exception problem. I had to move from gcc 4.7 (most of that system libs are built on it) to 4.9 for cyrus and xapian, because xapian minimum requirement is 4.8. Now squatter is pullin zlib in its dependencies, which is built from 4.7, and this is pulling 4.7 at runtime. GDB showed me that the exception is thrown in gcc 4.9 lib while going up the backtrace everything is handled by 4.7.
I'm working to see how to handle this.
Great that you could solve this!
The real solution was simpler than expected: just used LD_LIBRARY_PATH to point to both 32 and 64 dirs of the required gcc libs, when running this daemons, forcing usage of the same runtime version to all dependencies.
Thanks for all your efforts! Gabriele
Hi, I recently upgraded one of our Cyrus installations to 3.4.1. We built Cyrus and Xapian ourselves on our XStreamOS/illumos distro, using your specific Xapian tarball., It works fine using base squatter, but we want to move to Xapian indexing so we configured according to documentation:
Running "squatter -v" it segfaults at first mailbox:
What I found is that it's looking at a Glass database file that is not present:
/sonicle/var/cyrus/search/domain/s/sonicle.com/a/user/admin/xapian/iamglass
I can see other files created in that directory: cyrus.indexed.db and flintlock
What may be the reason for this?