Manouchehri / serf

High-performance asynchronous HTTP client library
https://code.google.com/p/serf
Apache License 2.0
0 stars 0 forks source link

With APR pool debugging enabled, Serf SEGFAULTs during pool cleanup. #76

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When compiling Serf and APR with pool debugging enabled, errors raised by the 
response handler seem to cause Serf to SEGFAULT during pool cleanup.  Debugging 
shows the location of the fault to be here:

{{{
(gdb) run http://localhost/abc
Starting program: /home/cmpilato/tmp/subversion/serf/test/serf_get 
http://localhost/abc
[Thread debugging using libthread_db enabled]
Error running context: (1) Operation not permitted

Program received signal SIGSEGV, Segmentation fault.
0x08066874 in allocator_free (allocator=0x41414141, node=0x8088e90)
    at memory/unix/apr_pools.c:364
364     if (allocator->mutex)
(gdb) where
#0  0x08066874 in allocator_free (allocator=0x41414141, node=0x8088e90)
    at memory/unix/apr_pools.c:364
#1  0x080669f3 in apr_allocator_free (allocator=0x41414141, node=0x8088e90)
    at memory/unix/apr_pools.c:439
#2  0x0804fb32 in serf_bucket_mem_free (allocator=0x8086b18, block=0x8088eb0)
    at buckets/allocator.c:265
#3  0x0804d675 in serf_default_destroy_and_data (bucket=0x8087590)
    at buckets/buckets.c:120
#4  0x0804e8c2 in serf_response_destroy_and_data (bucket=0x8087590)
    at buckets/response_buckets.c:94
#5  0x080538dc in destroy_request (request=0x80806c0) at outgoing.c:306
#6  0x080539fa in cancel_request (request=0x80806c0, list=0x807e568, 
    notify_request=0) at outgoing.c:350
#7  0x08054ebf in serf_request_cancel (request=0x80806c0) at outgoing.c:1268
#8  0x08054b3a in serf_connection_close (conn=0x807e508) at outgoing.c:1113
#9  0x080533fd in clean_conn (data=0x807e508) at outgoing.c:56
#10 0x08067be3 in run_cleanups (cref=0x807d4e0)
    at memory/unix/apr_pools.c:2346
#11 0x08066d76 in pool_clear_debug (pool=0x807d4d0, 
    file_line=0x8077630 "test/serf_get.c:436")
    at memory/unix/apr_pools.c:1547
#12 0x08066f40 in pool_destroy_debug (pool=0x807d4d0, 
    file_line=0x8077630 "test/serf_get.c:436")
    at memory/unix/apr_pools.c:1632
#13 0x08066ff8 in apr_pool_destroy_debug (pool=0x807d4d0, 
    file_line=0x8077630 "test/serf_get.c:436")
    at memory/unix/apr_pools.c:1674
#14 0x0804cb0f in main (argc=2, argv=0xbffff854) at test/serf_get.c:436
}}}

Note that in lowest frames, "allocator" is set to APR's pool debugging junk 
variable (0x41 repeated for each byte).

This was originally noticed in the Subversion project, tracked in 
http://subversion.tigris.org/issues/show_bug.cgi?id=3875.

In attempting to generalize the problem to just a Serf scenario, I found what I 
hope is a helpful reproduction recipe.  Of course, begin by compiling APR with 
--enable-pool-debug.  Then:

  1. Change test/serf_get.c:handle_response() to "return 1;" immediate inside the while (1) loop.
  2. 'make' (to recompile the serf_get utility).
  3. Run 'test/serf_get http://google.com'

{{{
$ test/serf_get http://google.com
Error running context: (1) Operation not permitted
Segmentation fault
$
}}}

Original issue reported on code.google.com by cmpilato on 8 Jun 2011 at 6:17

GoogleCodeExporter commented 9 years ago
FYI, This might be fixed by one of the patches we have for serf in 
mod_pagespeed (I plan to cleanup and submit them at some point...):

--- src/third_party/serf/src/buckets/allocator.c    2011-05-06 16:22:56.220366814 
-0400
+++ src/third_party/serf/src/buckets/allocator.c    2011-05-06 11:08:31.684199000 
-0400
@@ -83,6 +83,7 @@
 struct serf_bucket_alloc_t {
     apr_pool_t *pool;
     apr_allocator_t *allocator;
+    int own_allocator;

     serf_unfreed_func_t unfreed;
     void *unfreed_baton;
@@ -106,6 +107,9 @@
     if (allocator->blocks) {
         apr_allocator_free(allocator->allocator, allocator->blocks);
     }
+    if (allocator->own_allocator == 1) {
+        apr_allocator_destroy(allocator->allocator);
+    }

     return APR_SUCCESS;
 }
@@ -119,10 +123,12 @@

     allocator->pool = pool;
     allocator->allocator = apr_pool_allocator_get(pool);
+    allocator->own_allocator = 0;
     if (allocator->allocator == NULL) {
         /* This most likely means pools are running in debug mode, create our
          * own allocator to deal with memory ourselves */
         apr_allocator_create(&allocator->allocator);
+        allocator->own_allocator = 1;
     }
     allocator->unfreed = unfreed;
     allocator->unfreed_baton = unfreed_baton;

Original comment by morlov...@google.com on 8 Jun 2011 at 6:24

GoogleCodeExporter commented 9 years ago
My debugging was leading me down the same road -- to the custom handling of the 
allocator when pool debugging was enabled.  I'll give your patch a try.

Original comment by cmpilato on 9 Jun 2011 at 7:10

GoogleCodeExporter commented 9 years ago
No effect at all.  And actually, now that I read the patch, I'm not surprised.  
I *think* the SEGFAULT is due to a doubled attempt to cleanup the allocator, 
not a missing one.  This patch seems to add yet another destruction attempt on 
the allocator.

Original comment by cmpilato on 9 Jun 2011 at 7:18

GoogleCodeExporter commented 9 years ago
You are right; I misremembered the patch rationale: it was meant to be a leak 
fix. 
Please pardon my screw up. 

I think the issue may be that the allocator gets cleaned up before the 
bucket/requests...

Original comment by morlov...@google.com on 9 Jun 2011 at 7:37

GoogleCodeExporter commented 9 years ago
r1488 fixes a segfault on pool cleanup, but with a slightly different 
stacktrace. The actual problem is that the response bucket is first destroyed 
when the respool allocator is cleaned up, and then again on cancel_request when 
the connection pool is cleaned up (respool is a child of conn->pool so gets 
cleaned up first).

This root cause can very well cause the stacktrace in this issue, but on 
Windows, in my simulations I always got the one from 
http://subversion.tigris.org/issues/show_bug.cgi?id=3917

Closing as "Fixed", let us know of the problems persist.

Thanks for the detailed report!

Original comment by lieven.govaerts@gmail.com on 16 Jun 2011 at 5:51

GoogleCodeExporter commented 9 years ago
I can confirm that this seems to fix the problem.

Original comment by cmpilato on 16 Jun 2011 at 5:56

GoogleCodeExporter commented 9 years ago
By the way, I've filed issue 79 to track the patch previously posted to this 
issue (or, my modified version thereof) so it doesn't get lost here.

Original comment by cmpilato on 21 Jun 2011 at 4:12