Closed buddy-ekb closed 10 years ago
@buddy-ekb Thank you for the report! We'll look into this. BTW, what versions of nginx and ngx_postgres are you using?
@buddy-ekb: are you running it under Valgrind with master_procces off
? If so, that's a known issue (and the reason 1.0 isn't released yet).
I'd be interested in seeing coredump from the segfaults, tho.
@agentzh Nginx version is 1.5.7, the binary is for linux x86_64 compiled with ngx_postgres 1.0rc3 and linked against postgresql 9.3.1 client library from pgdg rpm repository.
@PiotrSikora I put necessary files here https://www.dropbox.com/sh/m19ozujak1jucup/_9U4V5JXgY
Of course, I was running nginx under valgrind with master_procces off
, but indeed it dumped the provided core without valgrind and had been dumping cores without master_process off
as well, so it is very likely that the problem is not only in valgrind.
@buddy-ekb Thank you for the info! I'll try to look into this issue in the coming days :)
@buddy-ekb: the core dump you provided is from nginx running with master_process off
, could you please provide one from nginx that runs normally? I tend to think that the core dumps in productions are different than this one... Thanks!
@PiotrSikora here you are https://www.dropbox.com/sh/krmghdszj28bkkv/vc1uCpMsIu
A few remarks on them:
master_process on
;daemon on
in the latter case;@buddy-ekb Sorry for the delay on my side! I've just committed a fix for your problem to git master. Will you try it out on your side? Thank you!
With the patch the stress testing has been passing without a hitch. Thank you very much, @agentzh !
@buddy-ekb BTW, I've noticed that you've enabled debug logging via the --with-debug
option (and this segfault only happened with debug logging enabled). If you are serious about performance, better disable this in your nginx (or openresty) build.
@agentzh, thank you for the remark!
Though I consider this performance penalty cost nothing against transaction handling inside PostgreSQL even in case of simpe "SELECT 1". As a matter of fact, I did some benchmarks on localhost "debug vs. nodebug", "ipv4 vs. unix sockets", "ipv4 vs. ipv6" and it seems to me there is no significant difference between these options.
I suspect that there is some bug in ngx_postgres' connection cache code. While stressing the nginx with "ab -c 20 ..." it occasionally segfaulted, so I ran it under valgrind. There is the result:
==25581== Invalid read of size 8 ==25581== at 0x406DAB: ngx_destroy_pool (ngx_palloc.c:76) ==25581== by 0x4B6457: ngx_postgres_upstream_free_connection (ngx_postgres_upstream.c:584) ==25581== by 0x4B00FE: ngx_postgres_keepalive_free_peer (ngx_postgres_keepalive.c:218) ==25581== by 0x4B60DC: ngx_postgres_upstream_free_peer (ngx_postgres_upstream.c:509) ==25581== by 0x4B6674: ngx_postgres_upstream_finalize_request (ngx_postgres_util.c:79) ==25581== by 0x4B4DCD: ngx_postgres_upstream_done (ngx_postgres_processor.c:507) ==25581== by 0x4B4D46: ngx_postgres_upstream_get_ack (ngx_postgres_processor.c:488) ==25581== by 0x4B4951: ngx_postgres_upstream_get_result (ngx_postgres_processor.c:366) ==25581== by 0x4B40F0: ngx_postgres_process_events (ngx_postgres_processor.c:76) ==25581== by 0x4AF7D2: ngx_postgres_rev_handler (ngx_postgres_handler.c:314) ==25581== by 0x46415C: ngx_http_upstream_handler (ngx_http_upstream.c:976) ==25581== by 0x437C1D: ngx_epoll_process_events (ngx_epoll_module.c:691) ==25581== by 0x42853F: ngx_process_events_and_timers (ngx_event.c:248) ==25581== by 0x4347E4: ngx_single_process_cycle (ngx_process_cycle.c:315) ==25581== by 0x403DF4: main (nginx.c:404) ==25581== Address 0x5641930 is 96 bytes inside a block of size 256 free'd ==25581== at 0x4A063F0: free (vg_replace_malloc.c:446) ==25581== by 0x406E39: ngx_destroy_pool (ngx_palloc.c:87) ==25581== by 0x44E8D2: ngx_http_close_connection (ngx_http_request.c:3489) ==25581== by 0x44E56D: ngx_http_close_request (ngx_http_request.c:3350) ==25581== by 0x44E0C1: ngx_http_lingering_close_handler (ngx_http_request.c:3209) ==25581== by 0x44DF6B: ngx_http_set_lingering_close (ngx_http_request.c:3171) ==25581== by 0x44C858: ngx_http_finalize_connection (ngx_http_request.c:2493) ==25581== by 0x44C443: ngx_http_finalize_request (ngx_http_request.c:2384) ==25581== by 0x4B6800: ngx_postgres_upstream_finalize_request (ngx_postgres_util.c:140) ==25581== by 0x4B4DCD: ngx_postgres_upstream_done (ngx_postgres_processor.c:507) ==25581== by 0x4B4D46: ngx_postgres_upstream_get_ack (ngx_postgres_processor.c:488) ==25581== by 0x4B4951: ngx_postgres_upstream_get_result (ngx_postgres_processor.c:366) ==25581== by 0x4B40F0: ngx_postgres_process_events (ngx_postgres_processor.c:76) ==25581== by 0x4AF7D2: ngx_postgres_rev_handler (ngx_postgres_handler.c:314) ==25581== by 0x46415C: ngx_http_upstream_handler (ngx_http_upstream.c:976) ==25581== by 0x437C1D: ngx_epoll_process_events (ngx_epoll_module.c:691) ==25581== by 0x42853F: ngx_process_events_and_timers (ngx_event.c:248) ==25581== by 0x4347E4: ngx_single_process_cycle (ngx_process_cycle.c:315) ==25581== by 0x403DF4: main (nginx.c:404)
I believe that "ngx_postgres_keepalive_free_peer" (ngx_postgres_keepalive.c:164) somehow fails to correctly isolate some nginx memory pool from freeing which occurs later at (ngx_postgres_util.c:140) and eventually leads to an attempt of reading invalid data from freed and reused memory area.
I'm looking forward to giving any comprehensive information to anybody whom it may concern.