google-code-export / lusca-cache

Automatically exported from code.google.com/p/lusca-cache
0 stars 0 forks source link

method_t request_t crash svn #99

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
(gdb) backtrace
#0  0x0000000800ba87ec in thr_kill () from /lib/libc.so.7
#1  0x000000080081d30c in pthread_kill () from /lib/libthr.so.3
#2  0x000000080081ae00 in raise () from /lib/libthr.so.3
#3  0x0000000800c453ce in abort () from /lib/libc.so.7
#4  0x000000000045cf69 in death (sig=100077) at tools.c:321
#5  <signal handler called>
#6  0x0000000800c3ae32 in open () from /lib/libc.so.7
#7  0x0000000800bab4e4 in vsnprintf () from /lib/libc.so.7
#8  0x000000000043b037 in logfilePrintf (lf=0x800e89a18,
    fmt=0x489b08 "%s %s %s [%s] \"%s %s HTTP/%d.%d\" %d %ld %s:%s") at 
logfile.c:150
#9  0x00000000004048f8 in accessLogCommon (al=0x8050c55f8, 
logfile=0x800e89a18)
    at access_log.c:1134
#10 0x0000000000404be1 in accessLogLog (al=0x8050c55f8, 
checklist=0x806860318)
    at access_log.c:1192
#11 0x000000000041c01a in httpRequestFree (data=Variable "data" is not 
available.
) at client_side.c:313
#12 0x00000000004214ac in connStateFree (fd=Variable "fd" is not 
available.
) at client_side_conn.c:24
#13 0x0000000000479b59 in commCallCloseHandlers (fd=78) at comm.c:640
#14 0x000000000047a7a4 in comm_close (fd=78) at comm.c:818
#15 0x00000000004796ea in CommWriteStateCallbackAndFree (fd=78, code=0) 
at comm.c:130
#16 0x000000000047de04 in comm_select (msec=Variable "msec" is not 
available.
) at comm_generic.c:300
#17 0x000000000043dd70 in main (argc=Variable "argc" is not available.
) at main.c:928

(gdb) frame 5
#5  <signal handler called>

(gdb) frame 6
#6  0x0000000800c3ae32 in open () from /lib/libc.so.7

(gdb) frame 7
#7  0x0000000800bab4e4 in vsnprintf () from /lib/libc.so.7

(gdb) frame 8
#8  0x000000000043b037 in logfilePrintf (lf=0x800e89a18,
    fmt=0x489b08 "%s %s %s [%s] \"%s %s HTTP/%d.%d\" %d %ld %s:%s") at 
logfile.c:150
150     logfile.c: Nenhum arquivo ou diretório encontrado.
        in logfile.c

(gdb) frame 9
#9  0x00000000004048f8 in accessLogCommon (al=0x8050c55f8, 
logfile=0x800e89a18)
    at access_log.c:1134
1134    access_log.c: Nenhum arquivo ou diretório encontrado.
        in access_log.c

(gdb) print *al
$1 = {url = 0x61bca0 "NONE://", http = {method = 0x806b55980, code = 501,
    content_type = 0x8067ce9d0 "text/html", version = {major = 0, minor = 
0}}, icp = {
    opcode = ICP_INVALID}, cache = {caddr = {s_addr = 1664129480}, out_ip 
= {s_addr = 0},
    size = 1082, rq_size = 51, code = LOG_TCP_DENIED, msec = 4, rfc931 = 
0x0,
    authuser = 0x0}, headers = {request = 0x0, reply = 0x0}, private = {
    method_str = 0x11709 <Address 0x11709 out of bounds>}, hier = {code = 
HIER_NONE,
    host = "-", '\0' <repeats 254 times>, ping = {start = {tv_sec = 0, 
tv_usec = 0},
      stop = {tv_sec = 0, tv_usec = 0}, n_sent = 0, n_recv = 0, 
n_replies_expected = 0,
      timeout = 0, timedout = 0, w_rtt = 0, p_rtt = 0},
    cd_host = '\0' <repeats 255 times>, cd_lookup = LOOKUP_NONE, 
n_choices = 0,
    n_ichoices = 0, peer_select_start = {tv_sec = 0, tv_usec = 0}, 
store_complete_stop = {
      tv_sec = 0, tv_usec = 0}}, reply = 0x8050f8fc0, request = 
0x8050d8600,
  ext_refresh = 0x0}

(gdb) print * al->http.method
$3 = {code = 4061773008, string = 0x11709 <Address 0x11709 out of 
bounds>, flags = {
    cachable = 0, purges_all = 0}}

Original issue reported on code.google.com by hm@hm.net.br on 30 Mar 2010 at 11:14

GoogleCodeExporter commented 9 years ago
This problem has been narrowed down to the behaviour of requestCreate() with
non-well-known methods.

requestCreate() takes a simple pointer to method_t. It will just assign that to 
the
new request_t without any reference counting.

In the case of well-known method's, they just point to a static set of
pre-initialised method_t's which are never modified or free'd.

In the case of non-well-known-methods (ie, METHOD_OTHER with the method string 
set)
then it's allocated and freed as needed.

The problem is the pointer assignment. The original creator of the method may 
then
free the method_t. This'll then be overwritten with other data and be free'ed 
later
on. This is just going to cause a huge mess.

The initial fix is to patch requestCreate(). I don't know if this is 100% 
correct - I
need to investigate all of the paths leading to requestCreate() and ensure that 
the
passed in method_t isn't "given" to the new request_t. Ie, that the caller of
requestCreate() (and their callers) free their method_t reference themselves 
later.

Original comment by adrian.c...@gmail.com on 30 Mar 2010 at 11:49

Attachments:

GoogleCodeExporter commented 9 years ago
Ok. Try this patch.

* change requestCreate() to duplicate method_t's, rather than just copy pointers
* change the few places where method_t's are actually parsed (rather than just 
using
METHOD_GET) to explicitly deallocate their copy afterwards

This should work correctly for non-standard methods without leaking memory or
overwriting/double-free'ing entries.

Original comment by adrian.c...@gmail.com on 30 Mar 2010 at 2:59

Attachments:

GoogleCodeExporter commented 9 years ago
hm has noted (via lusca-users@) that this patch leaks memory in the real world. 
Ok,
time to crack out valgrind and figure out why.

Original comment by adrian.c...@gmail.com on 1 Apr 2010 at 1:03

GoogleCodeExporter commented 9 years ago
There's another method assignment in src/store_vary.c that needs to be turned 
into a
urlMethodDup().

Question is whether method's are being leaked because they're replacing previous
methods via assignment, rather than free'ing any previous method. Hm! Good 
question!

Original comment by adrian.c...@gmail.com on 1 Apr 2010 at 2:08

GoogleCodeExporter commented 9 years ago
the memory leak is not as big as I suspected, it came along a FreeBSD source 
upgrade (STable) which has some serious problems

anyway, lusca use much more memory as before, I could compensate by reducing 
cache_mem 750 (before patch) to 400 (after patch) which both lead to about 
3.5Gig 
mem use at runtime, otherwise it goes swapping a gig extra

Original comment by hm@hm.net.br on 1 Apr 2010 at 7:59

GoogleCodeExporter commented 9 years ago
Ok. Please try this patch instead. This patch does -not- fix the crash; it 
instead
logs where I may see the occasional leak in the assignment of method_t pointers.

It will log a message when an existing non-known method pointer (that isn't 
NULL) is
copied over with some other method. Copying over known method pointers is wrong 
but
won't leak - but copying over unknown method pointers will end up with a 
dangling
string buffer containing the unknown method string (ie, the RTSP methods.)

Please apply this patch and tell me if it starts logging stuff. I'd like to 
commit it
to HEAD as part of the method_t tidyup that I'm now forced to do. :)

Original comment by adrian.c...@gmail.com on 1 Apr 2010 at 1:37

Attachments:

GoogleCodeExporter commented 9 years ago
We'll look at fixing the memory usage later. It's quite possible I've screwed up
somewhere else. I'll investigate that later.

Original comment by adrian.c...@gmail.com on 1 Apr 2010 at 1:38

GoogleCodeExporter commented 9 years ago
H, if you run the earlier version of Lusca (with the rebuild helper fixes), 
does it
go back to using less disk IO?

Original comment by adrian.c...@gmail.com on 1 Apr 2010 at 1:42

GoogleCodeExporter commented 9 years ago
Here's an updated patch to try.

Original comment by adrian.c...@gmail.com on 2 Apr 2010 at 12:27

Attachments:

GoogleCodeExporter commented 9 years ago
Here's a patch to try against LUSCA_HEAD rev r14529.

It implements the specific change in method_t handling when creating new 
request_t's
from existing method_t pointers.

Please test this and get back to me!

Original comment by adrian.c...@gmail.com on 3 Apr 2010 at 5:59

Attachments:

GoogleCodeExporter commented 9 years ago
It's been committed to LUSCA_HEAD SVN. I'm now waiting for feedback from hm 
from his
live live caches to see whether it's fixed the issue.

Original comment by adrian.c...@gmail.com on 5 Apr 2010 at 10:32

GoogleCodeExporter commented 9 years ago
I guess we can consider this fixed. I have some reasonable servers running  
since ..2.diff without any further problem. Maybe premature since this didin't 
happened every day but for the moment I think case cleared.
thank's

Original comment by hm@hm.net.br on 8 Apr 2010 at 6:46

GoogleCodeExporter commented 9 years ago
Sweet! Thanks very much.

Original comment by adrian.c...@gmail.com on 8 Apr 2010 at 7:42