cyrusimap / cyrus-imapd

Cyrus IMAP is an email, contacts and calendar server
http://cyrusimap.org
Other
550 stars 151 forks source link

Squatter 2.5.17 core dump - gdb backtrace #3918

Closed gbulfon closed 1 week ago

gbulfon commented 2 years ago

Hi, on installations where I still have 2.5.11 or 2.5.17 , many big folders cause squatter to core dump. I installed a "-g" version of squatter and got the backtrace:

(gdb) backtrace
#0  0x000000000042ee37 in add_word_to_trie (index=0x8c9310,
    word_ptr=0xffffdf7fa62d7518 <error: Cannot access memory at address 0xffffdf7fa62d7518>, doc_ID=122300) at imap/squat_build.c:790
#1  0x000000000043034e in dump_index_trie_words (index=0x8c9310, first_char=191, result_offset=0xffffdf7fffdfde8c)
    at imap/squat_build.c:1449
#2  0x00000000004307e4 in index_close_internal (index=0x8c9310, OK=1) at imap/squat_build.c:1589
#3  0x0000000000430c35 in squat_index_finish (index=0x8c9310) at imap/squat_build.c:1695
#4  0x000000000042d404 in squat_single (state=0x6080b0, incremental=0) at imap/squatter.c:484
#5  0x000000000042d9c1 in index_me (name=0xffffdf7fffdfeab0 "eurovetrocap.com!user.sharon^taverna", matchlen=0, maycreate=0,
    rock=0xffffdf7fffdffab4) at imap/squatter.c:636
#6  0x000000000042dd7e in main (argc=5, argv=0xffffdf7fffdffb28) at imap/squatter.c:746

Looks like that pointer is not valid. Any known solution on the 2.5.11-17 source tree? This doesn't happen on the same folders using squatter 3.4 (but it actually produces much smaller squat files).

Please help, I cannot upgrade some of these installations to 3.4 yet.

Gabriele

elliefm commented 2 years ago

You can ignore those "error: Cannot access memory at address 0xffff..." in the backtrace, those are just pointers into an mmap'd file, and the mmap is no longer available when you examine the core dump after the fact. I thought they might have been the result of a signed overflow or some kind of 32bit/64bit problem (which is why I'm here), but nope, just mmaps.

Here's the code that's crashing in frame 0 of that backtrace:

https://github.com/cyrusimap/cyrus-imapd/blob/15e3180e427937c9e51bcaac06862b831660ddea/imap/squat_build.c#L785-L794

Smells to me like index->word_doc_allocator++; on 788 has run over the end of whatever buffer index->word_doc_allocator points to, so a junk pointer gets assigned to word_entry, and then at 790 it crashes trying to dereference it.

I know nothing about these data structures. I can't tell whether index->word_doc_allocator has been accidentally underallocated, or whether the data being built is larger than the format can support, or whether it's supposed to grow the buffer when it fills but is failing to do so, etc.

Here's where index->word_doc_allocator is initialised. This is in dump_index_trie_words, which is frame 1 in the backtrace:

https://github.com/cyrusimap/cyrus-imapd/blob/15e3180e427937c9e51bcaac06862b831660ddea/imap/squat_build.c#L1414-L1418

Looks like it thinks it knows how many it needs, and tries to allocate exactly that number. Maybe it miscalculates? A disgusting hack that might work is adding 1 to this computation, like ... (num_words + existing + 1).

Here's where WordDocEntry is defined:

https://github.com/cyrusimap/cyrus-imapd/blob/15e3180e427937c9e51bcaac06862b831660ddea/imap/squat_build.c#L171-L176

Note that it very clearly describes itself as a circular linked list. So it's weird to me that the "allocator" expects to just increment and find a fresh one, without checking that it's not already in use or hasn't walked off the end. I guess the add_to_table must handle the linking, idk.

If you still have that core file, can you please run the following commands in gdb and paste the output?

frame 1
print num_words
print existing
gbulfon commented 2 years ago

@elliefm thanks, here it is!

(gdb) frame 1
#1  0x000000000043034e in dump_index_trie_words (index=0x8f4a90,
    first_char=191, result_offset=0xffffdf7fffdfde7c)
    at imap/squat_build.c:1449
1449                if (add_word_to_trie(index, word_ptr, doc_ID)
(gdb) print num_words
$1 = 268402217
(gdb) print existing
$2 = 0
(gdb)
elliefm commented 2 years ago

Thanks. num_words being 268 million is interesting -- that's much too small to suggest an underflow problem, and much too large to suggest an overflow problem. So I guess it must just be a normal number of words? If maybe on the large size due to "many big folders".

Please hang onto this core file if you can. I will need to ask you for more gdb output from it, but I don't know what to ask for yet.

If you still have the command line that was used to invoke squatter, can you post it please? I don't know it will be useful or not, but maybe some option you're using or not using will turn out to be significant, if not to the cause of the bug, then at least for helping me track the program flow through the code. I do not know squatter or the squat database at all, so every little detail may help. Please censor any user or mailbox names -- if they're included in the command line I might to see that they were included, but I won't need to see what they were.

elliefm commented 2 years ago

What does gdb say for this:

frame 0
print word_entry
print *word_entry
frame 1
print doc_words
print index->total_num_words[191]
gbulfon commented 2 years ago

Here it is:

(gdb) frame 0
#0  0x000000000042ee37 in add_word_to_trie (index=0x8f4a90, word_ptr=0xffffdf7fa0609f77 <error: Cannot access memory at address 0xffffdf7fa0609f77>, doc_ID=160357) at imap/squat_build.c:790
790     in imap/squat_build.c
(gdb) print word_entry
$1 = (WordDocEntry *) 0x122c09000
(gdb) print *word_entry
Cannot access memory at address 0x122c09000
(gdb) frame 1
#1  0x000000000043034e in dump_index_trie_words (index=0x8f4a90, first_char=191, result_offset=0xffffdf7fffdfde7c) at imap/squat_build.c:1449
1449    in imap/squat_build.c
(gdb) print doc_words
$2 = 32347
(gdb) print index->total_num_words[191]
$3 = 449467521
(gdb)
gbulfon commented 2 years ago

The command was:

squatter -v -i -s user/name/folder@domain.tld

elliefm commented 2 years ago

Thanks, that at least confirms that the problem is definitely the dereference of word_entry, and not some other weird thing I hadn't considered.

num_words is a decrementing iterator and it starts at index->total_num_words[first_char]. So in this case, num_words started at 449 million, and it didn't crash until around 268 million. So it handled about 200 million word successfully before crashing. That number doesn't mean anything special to me. There were about 32 thousand words in the particular document being processed at the time of the crash, and sometimes that's a significant number, but I don't think it is here.

What's this say?

frame 1
print doc_table
print sizeof WordDocEntry

I see you're already running squatter on an individual folder (and not an entire user). Does this folder have subfolders, and does it succeed if you run it on the subfolders individually first, before running it on the folder itself? I assume not, since I assume you either already tried this, or there are no subfolders to try it on. But I'd like to confirm and not just assume.

gbulfon commented 2 years ago

Looks like "sizeof WordDocEntry" is not correct, I changed it into "sizeof(WordDocEntry)", I hope this is what you meant:

(gdb) frame 1
#1  0x000000000043034e in dump_index_trie_words (index=0x8f4a90, first_char=191, result_offset=0xffffdf7fffdfde7c) at imap/squat_build.c:1449
1449                if (add_word_to_trie(index, word_ptr, doc_ID)
(gdb) print doc_table
$1 = (WordDocEntry *) 0x761b1030
(gdb) print sizeof WordDocEntry
A syntax error in expression, near `'.
(gdb) print sizeof(WordDocEntry)
$2 = 16

About the amount of words, I see it totally absurd that it's really trying to index 200 million words on a folder! A single language probably doesn't have all that words. Unless the code is working also on attachment parts' base64 text, that would be completely useless. How can I see what's inside the words table, at least by sample?

About folders and subfolders: I had to change the squatter bash script to run each individual folder of every account one at a time, so that in case of crash it would at least go on with the other folders and subfolders. The crash usually happen on inboxes or sent folders, no problem on any subfolder when they're not very big.

elliefm commented 2 years ago

Thanks! So,

  doc_table =
    (WordDocEntry *) xmalloc(sizeof(WordDocEntry) *
                 (num_words + existing));

should have allocated (16 * (449467521 + 0)) = 7191480336 bytes (7GB!). The region it allocated starts at 0x761b1030, and if I'm calculating this correctly, ends at 0x222C05840

We crashed trying to write to 0x122c09000, which should be well within this region...

That's an obscene allocation though. How much memory does your system have, and how much swap space? We know the allocation succeeded -- the xmalloc would have fatal'd out if it had not. Maybe the kernal was able to optimistically allocate the region, but then didn't have enough swap space available to deliver when we tried to use it. Though this is getting into kernel territory, which I know even less about than squatdb...

How can I see what's inside the words table, at least by sample?

That's a good question; I'm not sure. I might be able to send you a patch that will log the words as it indexes them, and you could re-run it on this one folder and see what it logs? But if there's 7GB of them, the log would be absurd too...

If I'm reading the code correctly, this 7GB is all just for "words starting with byte value 191" -- we would be doing similar work for words-starting-with-every-other-character too. 191 is outside the 0-127 us-ascii range, so I have no idea what that character would even be without knowing the encoding of the message(s) it was found in.

Unless the code is working also on attachment parts' base64 text, that would be completely useless.

If it was trying to index the base64 text, it wouldn't involve character 191. Unless it had decoded the base64, and was trying to index the decoded content. I have no idea if it does this. Indexing attachment content could be useful, but only for attachments that are known to contain text. It would be useless to try and index the contents of a jpg or mp4 attachment, for example, but a pdf or word doc might be worth trying. But I don't even know what current squatter does wrt attachments, much less 2.5 squatter.

elliefm commented 2 years ago

Hang on a minute... if the program had 7GB allocated, then the core file you're getting backtraces from must be at least 7GB on disk to be complete. How big is this core file? I think gdb will warn when it's been truncated.

But if it has been truncated, that means that this:

(gdb) print word_entry
$1 = (WordDocEntry *) 0x122c09000
(gdb) print *word_entry
Cannot access memory at address 0x122c09000

might just be telling us that the truncated core file doesn't contain that information, not that the address was unreadable at run time.

Everything still looks like it crashed there, but with the core file probably truncated I'm suddenly uncertain...

Hmm. I'm not sure what platform you're on, but here's a snippet from my man malloc:

By default, Linux follows an optimistic memory allocation strategy. This means that when malloc() returns non-NULL there is no guarantee that the memory really is available. In case it turns out that the system is out of memory, one or more processes will be killed by the OOM killer. For more information, see the description of /proc/sys/vm/overcommit_memory and /proc/sys/vm/oom_adj in proc(5), and the Linux kernel source file Documentation/vm/overcommit-accounting.

That seems to confirm my hunch that it "successfully" allocated that 7GB region, but then couldn't deliver it when we tried to use it.

And continuing,

Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable using mallopt(3). Prior to Linux 4.7 allocations performed using mmap(2) were unaffected by the RLIMIT_DATA resource limit; since Linux 4.7, this limit is also enforced for allocations performed using mmap(2).

Well, that sounds like it's probably using an mmap for this allocation since it's much larger than 128kB. And we already talked about how if a pointer points into an mmap'd region at runtime, we can't examine it from the core file later because it no longer exists. D'oh!

It's sounding more and more to me like there isn't a crash bug here, so much as just that it runs out of memory. Which I'm not saying that's not a bug; but it's not a crash bug, it's a "naively assumes infinite memory" bug. I guess it might also be a "tries to index stuff it shouldn't" bug, and maybe if it wasn't indexing stuff it shouldn't, it wouldn't need so much memory. Hard to say.

I might be able to send you a patch that will log the words as it indexes them, and you could re-run it on this one folder and see what it logs? But if there's 7GB of them, the log would be absurd too...

Please let me know if this seems useful to you, and if so I'll see what I can come up with

gbulfon commented 2 years ago

@elliefm actually the core file is around 4GB and gdb complains with "warning: Unexpected size of section `.reg2/1' in core file." at startup. The machine has 128GB of RAM, runs our distro XStreamOS/illumos (derived from Open Solaris 11) and has plenty of CPUs running various zones (containers), one being the cyrus one. If the problem is about memory allocation failing, then I believe that squatter is doing more work than needed and allocating too much ram for an unneeded work.

I looked at both the source codes of 2.5 and 3.4 and it's clear that 3.4 is doing a lot less work, and producing much less big files, because the new version does a lot of work on content-type, probably skipping a lot of non-searchable parts. Old 2.5 version does nothing on content-type, to me it's trying to index any part, even binary ones.

Look here, 2.5.17:

https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-2.5.17/imap/index.c#L4057

and here, 3.4.3:

https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-3.4.3/imap/index.c#L5855

If only I could check for content type and skip if not "text/*", that may work great.

gbulfon commented 2 years ago

We have modified source of 2.5.11 to allow squatter to skip indexing useless binary data: that was the reason for the huge amount ot memory and index files, and time needed to index.

I hace a patch to index.c , where I found how to detect the indexed part's mime-type and skip if not "text/*". This can reduce squat disk space from 1/3 to 1/10! And full text search speed is now immediate! And time to index is reduced from 1/3 to 1/10! And no more core dumps!!! Looks like GitHub doesn't let me attach this file, I will paste the diff code here.

--- ../../../cyrus-imapd-2.5.11/cyrus-imapd-2.5.11/imap/index.c Wed May 12 04:46:34 2021
+++ index.c Fri Mar 18 08:32:11 2022
@@ -4059,12 +4059,12 @@
     struct mapfile msgfile = MAPFILE_INITIALIZER;
     int partsleft = 1;
     int subparts;
-    unsigned long start;
-    int len, charset, encoding;
+    unsigned long start,hstart;
+    int len, hlen, charset, encoding, i;
     int partcount = 0;
     char *p, *q;
     struct mailbox *mailbox = state->mailbox;
-  
+
     if (mailbox_map_message(mailbox, uid, &msgfile.base, &msgfile.size))
    return;

@@ -4101,15 +4101,29 @@
        cachestr += 5*4;

        while (--subparts) {
-           start = CACHE_ITEM_BIT32(cachestr+2*4);
-           len = CACHE_ITEM_BIT32(cachestr+3*4);
-           charset = CACHE_ITEM_BIT32(cachestr+4*4) >> 16;
-           encoding = CACHE_ITEM_BIT32(cachestr+4*4) & 0xff;
+           hstart = CACHE_ITEM_BIT32(cachestr+0*4);
+           hlen = CACHE_ITEM_BIT32(cachestr+1*4);
+           char hup[hlen];
+           //header upper case
+           for(i=0;i<hlen;++i) hup[i]=toupper(*((char *)(msgfile.base+hstart+i)));
+           //look for content type
+           char *ctype=strnstr(hup,"CONTENT-TYPE:",hlen);
+           if (ctype) {
+            //look for a valid text mime, skipping spaces
+            char *ctype1=ctype+13;
+            while((ctype1-hup<hlen) && *ctype1==' ') ++ctype1;
+            if ((ctype1-hup)<=(hlen-5) && strnstr(ctype1,"TEXT/",5)) {
+             start = CACHE_ITEM_BIT32(cachestr+2*4);
+             len = CACHE_ITEM_BIT32(cachestr+3*4);
+             charset = CACHE_ITEM_BIT32(cachestr+4*4) >> 16;
+             encoding = CACHE_ITEM_BIT32(cachestr+4*4) & 0xff;

-           if (start < msgfile.size && len > 0) {
-             charset_extractfile(receiver, rock, uid,
+             if (start < msgfile.size && len > 0) {
+               charset_extractfile(receiver, rock, uid,
                      msgfile.base + start,
                      len, charset, encoding, charset_flags);
+             }
+            }
            }
            cachestr += 5*4;
        }
MASHtm commented 2 years ago

The use if strnstr() is problematic on platforms other then BSD (libbsd needed). cyrus seems to use strstr() all over the source. The "TEXT/" strnstr() looks like it should be a strncmp().

And are you sure you do not miss the text of oldschool non-MIME messages?

MASHtm commented 2 years ago

Thanks for your idea and patch in the first place!

After failing to build with it applied on CentOS I modified your patch to ) use strstr() and strncmp() only (builds on Linux now as well) ) work on mail with DKIM-Signature: ....Content-Type:.... headers in front of Content-Type: header (most DKIM signed mail) *) treat non-MIME mail as all text

I will test this now on my local mailhost.

--- cyrus-imapd-2.5.17/imap/index.c.orig    2021-03-09 03:28:07.000000000 +0100
+++ cyrus-imapd-2.5.17/imap/index.c 2022-03-26 16:55:09.287530977 +0100
@@ -4063,12 +4063,12 @@
     struct mapfile msgfile = MAPFILE_INITIALIZER;
     int partsleft = 1;
     int subparts;
-    unsigned long start;
-    int len, charset, encoding;
+    unsigned long start,hstart;
+    int len, hlen, charset, encoding, i;
     int partcount = 0;
     char *p, *q;
     struct mailbox *mailbox = state->mailbox;
-  
+
     if (mailbox_map_message(mailbox, uid, &msgfile.base, &msgfile.size))
    return;

@@ -4105,15 +4105,39 @@
        cachestr += 5*4;

        while (--subparts) {
-           start = CACHE_ITEM_BIT32(cachestr+2*4);
-           len = CACHE_ITEM_BIT32(cachestr+3*4);
-           charset = CACHE_ITEM_BIT32(cachestr+4*4) >> 16;
-           encoding = CACHE_ITEM_BIT32(cachestr+4*4) & 0xff;
+           hstart = CACHE_ITEM_BIT32(cachestr+0*4);
+           hlen = CACHE_ITEM_BIT32(cachestr+1*4);  // we add a CR
+           char hup[hlen + 2];
+           //header upper case
+           // start buffer with a LF for easier DKIM-proof header matching
+           hup[0] = '\n';
+           for(i = 0; i < hlen; ++i) hup[i + 1] = toupper(*((char *)(msgfile.base + hstart + i)));
+           hup[i+2] = '\0';
+           // search for a content-type: at the beginning of the line
+           // otherwise it is most likely a DKIM-Sig
+           char * ctype = strstr(hup, "\nCONTENT-TYPE:");
+           // default to text in case it's a non MIME message
+           int textpart = 1;
+           if (ctype) {
+           //look for a valid text mime, skipping spaces
+           char * ctype1 = ctype + 14;
+           while((ctype1 - hup < hlen) && *ctype1 == ' ') ++ctype1;
+           // did it not match TEXT/ exactly? ignore it
+           if (! ((ctype1 - hup) <= (hlen - 5) && !strncmp(ctype1, "TEXT/",5)) ) {
+               textpart = 0;
+           }
+           }
+           if (textpart) {
+           start = CACHE_ITEM_BIT32(cachestr+2*4);
+           len = CACHE_ITEM_BIT32(cachestr+3*4);
+           charset = CACHE_ITEM_BIT32(cachestr+4*4) >> 16;
+           encoding = CACHE_ITEM_BIT32(cachestr+4*4) & 0xff;

-           if (start < msgfile.size && len > 0) {
-             charset_extractfile(receiver, rock, uid,
+           if (start < msgfile.size && len > 0) {
+               charset_extractfile(receiver, rock, uid,
                      msgfile.base + start,
                      len, charset, encoding, charset_flags);
+           }
            }
            cachestr += 5*4;
        }
gbulfon commented 2 years ago

@MASHtm great! thanks!

ankarakusu commented 2 years ago

Hi together, I found this case by chance and see, it's still open. I would like to announce our solution. We also had similar dumps and the patch from @elliefm helped us.

(xmalloc: use size_t for consistency with std functions ) https://github.com/cyrusimap/cyrus-imapd/commit/6b3ba8deaf2297c56d9c4360019beb205604a74a

Unfortunately we have to wait before going to version 3 and help us with problems by looking at new features and patches for version 3.

gbulfon commented 2 years ago

@ankarakusu yes you may announce the solution with this code, I already did it but I don't know how much interest is stiill there on 2.x.

About 3.x, we did the move only on one of our internal servers, and we found it's not yet prepared for official deployment on our cloud servers. We don't want to use xapian because we discovered it to be much slower than squatter. Squatter on 3.x has other issues that I posted here and on the mailing list, it seems that the problem is not very interesting for the main developers as they're more focused on xapian solution. We tried to look for a soloution ourselves, but the code on 3.x is very different from 2.x and it's not easy to fix those issues. Look here: https://github.com/cyrusimap/cyrus-imapd/issues/3901

rjbs commented 1 week ago

:robot: I'm closing this issue by automation.

This issue refers only to a defunct version of Cyrus. We're trying to get a good handle on our open issues and pull requests, so that we can reliably use them as a backlog of things we hope to work on. If you are confident that this issue should be reconsidered, reopen it and explain why.

Remember, though, that there are only so many developers working on Cyrus, and we'd like to focus on the most important and/or low-hanging work.

Before you reopen this issue, please check that this issue is still present in the latest release of Cyrus, or the main development branch. An issue is considerably more likely to get resolved if you can provide a pull request which reproduces the issue in a Cassandane test.