Closed jdelgad closed 8 years ago
nginx -V
nginx version: nginx/1.9.12
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
built with OpenSSL 1.0.1e-fips 11 Feb 2013
TLS SNI support enabled
configure arguments:
--prefix=/usr/share/nginx
--sbin-path=/usr/sbin/nginx
--conf-path=/etc/nginx/nginx.conf
--error-log-path=/var/log/nginx/error.log
--http-log-path=/var/log/nginx/access.log
--http-client-body-temp-path=/var/lib/nginx/tmp/client_body
--http-proxy-temp-path=/var/lib/nginx/tmp/proxy
--http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
--http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi
--http-scgi-temp-path=/var/lib/nginx/tmp/scgi
--pid-path=/run/nginx.pid
--lock-path=/run/lock/subsys/nginx
--user=nginx --group=nginx
--with-file-aio
--with-ipv6
--with-http_ssl_module
--with-http_realip_module
--with-http_addition_module
--with-http_xslt_module
--with-http_image_filter_module
--with-http_geoip_module
--with-http_sub_module
--with-http_dav_module
--with-http_flv_module
--with-http_mp4_module
--with-http_gunzip_module
--with-http_gzip_static_module
--with-http_random_index_module
--with-http_secure_link_module
--with-http_degradation_module
--with-http_stub_status_module
--with-http_perl_module
--with-mail --with-mail_ssl_module
--with-pcre
--with-pcre-jit
--with-google_perftools_module
--with-debug
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic'
--with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'
--with-http_v2_module
--add-module=/home/admin/ngx_pagespeed-release-1.10.33.5-beta
--with-debug
@jdelgad This should be fixed on trunk-tracking via these commits:
https://github.com/pagespeed/ngx_pagespeed/commit/b88e067c6d8173112d586fcdfe8d664c1226a00d https://github.com/pagespeed/ngx_pagespeed/commit/60c1f4cc4e58493be12e2c5861a5eb6aafb752c6
You could try cherry-picking them, it looks like they are not included in 1.10.33.5-beta
Thanks for the quick reply @oschaaf. I'm trying to get this built. I'll let you know within a few days if everything is working.
Are there plans to get this into 6-beta?
I'm getting these errors. It ended up filling up my partitions to 100%. I had to turn it off again.
#0 net_instaweb::NgxBaseFetch::ReadCallback (data=...) at /home/admin/ngx_pagespeed/src/ngx_base_fetch.cc:143
#1 0x0000564350c6c680 in net_instaweb::NgxEventConnection::ReadAndNotify (fd=20)
at /home/admin/ngx_pagespeed/src/ngx_event_connection.cc:122
#2 0x0000564350c6c739 in net_instaweb::NgxEventConnection::ReadEventHandler (ev=0x564352d5eec0)
at /home/admin/ngx_pagespeed/src/ngx_event_connection.cc:83
#3 0x0000564350be0951 in ngx_epoll_process_events (cycle=0x564352aab350, timer=<optimized out>,
flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:822
#4 0x0000564350bd6c9a in ngx_process_events_and_timers (cycle=cycle@entry=0x564352aab350)
at src/event/ngx_event.c:242
#5 0x0000564350bddf85 in ngx_worker_process_cycle (cycle=cycle@entry=0x564352aab350, data=data@entry=0x1)
at src/os/unix/ngx_process_cycle.c:753
#6 0x0000564350bdc8ea in ngx_spawn_process (cycle=cycle@entry=0x564352aab350,
proc=0x564350bddf30 <ngx_worker_process_cycle>, data=0x1, name=0x5643514adcff "worker process",
respawn=respawn@entry=1) at src/os/unix/ngx_process.c:198
#7 0x0000564350bdf6b7 in ngx_reap_children (cycle=0x564352aab350) at src/os/unix/ngx_process_cycle.c:621
#8 ngx_master_process_cycle (cycle=cycle@entry=0x564352aab350) at src/os/unix/ngx_process_cycle.c:174
#9 0x0000564350bb8907 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367
Here is the file after I merged. Just to eliminate a bad merge.
namespace net_instaweb {
const char kHeadersComplete = 'H';
const char kFlush = 'F';
const char kDone = 'D';
NgxEventConnection* NgxBaseFetch::event_connection = NULL;
int NgxBaseFetch::active_base_fetches = 0;
NgxBaseFetch::NgxBaseFetch(ngx_http_request_t* r,
NgxServerContext* server_context,
const RequestContextPtr& request_ctx,
PreserveCachingHeaders preserve_caching_headers,
NgxBaseFetchType base_fetch_type)
: AsyncFetch(request_ctx),
request_(r),
server_context_(server_context),
done_called_(false),
last_buf_sent_(false),
references_(2),
base_fetch_type_(base_fetch_type),
preserve_caching_headers_(preserve_caching_headers),
detached_(false),
suppress_(false) {
if (pthread_mutex_init(&mutex_, NULL)) CHECK(0);
__sync_add_and_fetch(&NgxBaseFetch::active_base_fetches, 1);
}
NgxBaseFetch::~NgxBaseFetch() {
pthread_mutex_destroy(&mutex_);
__sync_add_and_fetch(&NgxBaseFetch::active_base_fetches, -1);
}
bool NgxBaseFetch::Initialize(ngx_cycle_t* cycle) {
CHECK(event_connection == NULL) << "event connection already set";
event_connection = new NgxEventConnection(ReadCallback);
return event_connection->Init(cycle);
}
void NgxBaseFetch::Terminate() {
if (event_connection != NULL) {
GoogleMessageHandler handler;
PosixTimer timer;
int64 timeout_us = Timer::kSecondUs * 30;
int64 end_us = timer.NowUs() + timeout_us;
static unsigned int sleep_microseconds = 100;
handler.Message(
kInfo,"NgxBaseFetch::Terminate rounding up %d active base fetches.",
NgxBaseFetch::active_base_fetches);
// Try to continue processing and get the active base fetch count to 0
// untill the timeout expires.
// TODO(oschaaf): This needs more work.
while (NgxBaseFetch::active_base_fetches > 0 && end_us > timer.NowUs()) {
event_connection->Drain();
usleep(sleep_microseconds);
}
if (NgxBaseFetch::active_base_fetches != 0) {
handler.Message(
kWarning,"NgxBaseFetch::Terminate timed out with %d active base fetches.",
NgxBaseFetch::active_base_fetches);
}
// Close down the named pipe.
event_connection->Shutdown();
delete event_connection;
event_connection = NULL;
}
}
const char* BaseFetchTypeToCStr(NgxBaseFetchType type) {
switch(type) {
case kPageSpeedResource:
return "ps resource";
case kHtmlTransform:
return "html transform";
case kAdminPage:
return "admin page";
case kIproLookup:
return "ipro lookup";
case kPageSpeedProxy:
return "pagespeed proxy";
}
CHECK(false);
return "can't get here";
}
void NgxBaseFetch::ReadCallback(const ps_event_data& data) {
NgxBaseFetch* base_fetch = reinterpret_cast<NgxBaseFetch*>(data.sender);
ngx_http_request_t* r = base_fetch->request();
bool detached = base_fetch->detached();
#if (NGX_DEBUG) // `type` is unused if NGX_DEBUG isn't set, needed for -Werror.
const char* type = BaseFetchTypeToCStr(base_fetch->base_fetch_type_);
#endif
int refcount = base_fetch->DecrementRefCount();
#if (NGX_DEBUG)
ngx_log_error(NGX_LOG_DEBUG, ngx_cycle->log, 0,
"pagespeed [%p] event: %c. bf:%p (%s) - refcnt:%d - det: %c", r,
data.type, base_fetch, type, refcount, detached ? 'Y': 'N');
#endif
// If we ended up destructing the base fetch, or the request context is
// detached, skip this event.
if (refcount == 0 || detached) {
return;
}
ps_request_ctx_t* ctx = ps_get_request_context(r);
CHECK(data.sender == ctx->base_fetch);
int rc;
bool run_posted = true;
// If we are unlucky enough to have our connection finalized mid-ipro-lookup,
// we must enter a different flow. Also see ps_in_place_check_header_filter().
if ((ctx->base_fetch->base_fetch_type_ != kIproLookup)
&& r->connection->error) {
ngx_log_error(NGX_LOG_DEBUG, ngx_cycle->log, 0,
"pagespeed [%p] request already finalized %d", r, r->count);
rc = NGX_ERROR;
run_posted = false;
} else {
rc = ps_base_fetch::ps_base_fetch_handler(r);
}
#if (NGX_DEBUG)
ngx_log_error(NGX_LOG_DEBUG, ngx_cycle->log, 0,
"pagespeed [%p] ps_base_fetch_handler() returned %d for %c",
r, rc, data.type);
#endif
ngx_connection_t* c = r->connection;
ngx_http_finalize_request(r, rc);
if (run_posted) {
// See http://forum.nginx.org/read.php?2,253006,253061
ngx_http_run_posted_requests(c);
}
}
void NgxBaseFetch::Lock() {
pthread_mutex_lock(&mutex_);
}
void NgxBaseFetch::Unlock() {
pthread_mutex_unlock(&mutex_);
}
bool NgxBaseFetch::HandleWrite(const StringPiece& sp,
MessageHandler* handler) {
Lock();
buffer_.append(sp.data(), sp.size());
Unlock();
return true;
}
// should only be called in nginx thread
ngx_int_t NgxBaseFetch::CopyBufferToNginx(ngx_chain_t** link_ptr) {
CHECK(!(done_called_ && last_buf_sent_))
<< "CopyBufferToNginx() was called after the last buffer was sent";
// there is no buffer to send
if (!done_called_ && buffer_.empty()) {
*link_ptr = NULL;
return NGX_AGAIN;
}
int rc = string_piece_to_buffer_chain(
request_->pool, buffer_, link_ptr, done_called_ /* send_last_buf */);
if (rc != NGX_OK) {
return rc;
}
// Done with buffer contents now.
buffer_.clear();
if (done_called_) {
last_buf_sent_ = true;
return NGX_OK;
}
return NGX_AGAIN;
}
// There may also be a race condition if this is called between the last Write()
// and Done() such that we're sending an empty buffer with last_buf set, which I
// think nginx will reject.
ngx_int_t NgxBaseFetch::CollectAccumulatedWrites(ngx_chain_t** link_ptr) {
ngx_int_t rc;
Lock();
rc = CopyBufferToNginx(link_ptr);
Unlock();
return rc;
}
ngx_int_t NgxBaseFetch::CollectHeaders(ngx_http_headers_out_t* headers_out) {
const ResponseHeaders* pagespeed_headers = response_headers();
if (content_length_known()) {
headers_out->content_length = NULL;
headers_out->content_length_n = content_length();
}
return copy_response_headers_to_ngx(request_, *pagespeed_headers,
preserve_caching_headers_);
}
void NgxBaseFetch::RequestCollection(char type) {
if (suppress_) {
return;
}
// We must optimistically increment the refcount, and decrement it
// when we conclude we failed. If we only increment on a successfull write,
// there's a small chance that between writing and adding to the refcount
// both pagespeed and nginx will release their refcount -- destructing
// this NgxBaseFetch instance.
IncrementRefCount();
if (!event_connection->WriteEvent(type, this)) {
DecrementRefCount();
}
}
void NgxBaseFetch::HandleHeadersComplete() {
int status_code = response_headers()->status_code();
bool status_ok = (status_code != 0) && (status_code < 400);
if ((base_fetch_type_ != kIproLookup) || status_ok) {
// If this is a 404 response we need to count it in the stats.
if (response_headers()->status_code() == HttpStatus::kNotFound) {
server_context_->rewrite_stats()->resource_404_count()->Add(1);
}
}
RequestCollection(kHeadersComplete); // Headers available.
// For the IPRO lookup, supress notification of the nginx side here.
// If we send both the headerscomplete event and the one from done, nasty
// stuff will happen if we loose the race with with the nginx side destructing
// this base fetch instance.
if (base_fetch_type_ == kIproLookup && !status_ok) {
suppress_ = true;
}
}
bool NgxBaseFetch::HandleFlush(MessageHandler* handler) {
RequestCollection(kFlush); // A new part of the response body is available
return true;
}
int NgxBaseFetch::DecrementRefCount() {
return DecrefAndDeleteIfUnreferenced();
}
int NgxBaseFetch::IncrementRefCount() {
return __sync_add_and_fetch(&references_, 1);
}
int NgxBaseFetch::DecrefAndDeleteIfUnreferenced() {
// Creates a full memory barrier.
int r = __sync_add_and_fetch(&references_, -1);
if (r == 0) {
delete this;
}
return r;
}
void NgxBaseFetch::HandleDone(bool success) {
// TODO(jefftk): it's possible that instead of locking here we can just modify
// CopyBufferToNginx to only read done_called_ once.
CHECK(!done_called_) << "Done already called!";
Lock();
done_called_ = true;
Unlock();
RequestCollection(kDone);
DecrefAndDeleteIfUnreferenced();
}
} // namespace net_instaweb
@jdelgad Oops, I ran a comparison, I see I missed a commit that you would have needed to cherry-pick. I'm sorry!
To make this as easy as possible, I suggest you should just overwrite your version with the trunk-tracking
version of ngx_base_fetch.cc
which includes all required changes:
https://raw.githubusercontent.com/pagespeed/ngx_pagespeed/trunk-tracking/src/ngx_base_fetch.cc
I confirm the issue is back:
worker process 22015 exited on signal 11 (core dumped)
Will you do another PR to be merged?
Wait, it seems this time the fix doesn't solve
@creativeprogramming Which fix did you apply?
No sorry I was wrong, i didn't really applied it before ... now yes, and it seems it solves.. (I applied the https://raw.githubusercontent.com/pagespeed/ngx_pagespeed/trunk-tracking/src/ngx_base_fetch.cc)
Ok, running it in production (with 250+ pageviews/minute) for 2 hours with the patch applied and no crash or error. Will it be merged?
That fix is seriously needed for me, as I get the site down in few minutes without it.
nginx version: nginx/1.9.4
built by gcc 4.9.2 (Ubuntu 4.9.2-10ubuntu13)
built with OpenSSL 1.0.2e 3 Dec 2015
TLS SNI support enabled
configure arguments: --with-threads --with-google_perftools_module --with-pcre --with-pcre-jit --with-ld-opt='-Wl,-z,relro -Wl,-E' --with-cc-opt='-D FD_SETSIZE=40048 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DTCP_FASTOPEN=23' --with-http_stub_status_module --with-http_spdy_module --with-http_ssl_module --prefix=/usr/local/nginx-pagespeed --sbin-path=/usr/local/nginx-pagespeed/nginx --conf-path=/etc/nginx/nginx.conf --pid-path=/var/run/nginx.pid --add-module=/root/nginx-source/ngx_pagespeed-release-1.10.33.5-beta --with-http_gzip_static_module --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --with-http_sub_module
@oschaaf Yeah, why don't you guys push this into a hotfix release? It's big enough to warrant it.
@oschaaf Which commit is missing from 1.10.33.5 that gives us these crashes?
@jeffkaufman https://github.com/pagespeed/ngx_pagespeed/commit/1964ef5219cc6585ced45662e38dce09dde518df But https://github.com/pagespeed/ngx_pagespeed/commit/b88e067c6d8173112d586fcdfe8d664c1226a00d and https://github.com/pagespeed/ngx_pagespeed/commit/60c1f4cc4e58493be12e2c5861a5eb6aafb752c6 would also be good to have.
I'm a bit confused, I thought they where included, and also https://developers.google.com/speed/pagespeed/module/release_notes mention these commits?
@oschaaf Yes, those commits are supposed to be in 1.10.33.5, but they're not there. Still trying to figure out why.
Figured it out. I had a draft of branch release-1.10.33.5-beta
on github that pointed to
https://github.com/pagespeed/ngx_pagespeed/commits/ed5b23dc563493163a8bdbc27bb9e7de166b5fc2 and then I was out of the office for two weeks. I handed off the release to @crowell who needed to cherry-pick @oschaaf 's fbde0ace7e468adea95a7912a24da8fff14e6cb4 to add it to the release, and he thought I had left the draft release tagged as master
. So he cherry-picked the commit onto master
and then force-pushed it to release-1.10.33.5-beta
(thinking that the warning was just because he was cherry-picking).
I'm preparing a 1.10.33.6 release now that properly includes these commits to fix this bug.
Sorry everyone!
1.0.33.6 is out now
This may be related to https://github.com/pagespeed/ngx_pagespeed/issues/1081
Server is running CentOS 7 with the latest patches. We use TLS with a certificate from LetsEncrypt. Running php 5.4 and using Xenforo forum software. We do redirection for XXX.com to www.XXX.com (anonymizing our site, but probably very poorly) for both http and https. http2 is supported.
As you'll see I do my best to optimize for performance to gain a competitive SEO advantage over our competitors. I implemented the fix found for https://trac.nginx.org/nginx/ticket/235 which has reduced our signal 11 errors, but not eradicated them. I believe there may be a problem with ngx_pagespeed given the stack trace posted below.
I can send logs and core dumps to anyone if needed. I've had to disable pagespeed for now. This can cause nginx to hang and our website to become completely unresponsive at times when under load (a lot of crashes can occur in a short time period). User agents range from iphones to Windows 10 so I haven't ruled out a platform/OS.
The common thread among the signal 11 signals I see are related to 200 redirects. This may be due to misconfiguring nginx, although it was working until I upgraded nginx and pagespeed (and added ssl) which is why I've posted all of my configuration below.
nginx errors.log snippet
nginx.conf
default.conf file under /etc/nginx/conf.d
backtrace information