CodementorIO / rest-firebase

Ruby Firebase REST API client built on top of rest-core
Apache License 2.0
107 stars 22 forks source link

HTTPClient::ReceiveTimeoutError: execution expired #1

Closed kidlab closed 8 years ago

kidlab commented 9 years ago

We are using RestFirebase 0.9.5, experiencing many HTTPClient::ReceiveTimeoutError on our production, although we already set the timeout to 30s. It does not always happen but quite often in around 2 weeks.

Here is the backtrace:

/usr/lib64/ruby/2.0.0/openssl/buffering.rb:54 in "sysread"
/usr/lib64/ruby/2.0.0/openssl/buffering.rb:54 in "fill_rbuff"
/usr/lib64/ruby/2.0.0/openssl/buffering.rb:201 in "gets"
/gems/httpclient-2.4.0/lib/httpclient/session.rb:352 in "gets"
/gems/httpclient-2.4.0/lib/httpclient/session.rb:881 in "block in parse_header"
/gems/httpclient-2.4.0/lib/httpclient/session.rb:877 in "parse_header"
/gems/httpclient-2.4.0/lib/httpclient/session.rb:860 in "read_header"
/gems/httpclient-2.4.0/lib/httpclient/session.rb:667 in "get_header"
/gems/httpclient-2.4.0/lib/httpclient.rb:1141 in "do_get_header"
/gems/httpclient-2.4.0/lib/httpclient.rb:1090 in "do_get_block"
/gems/newrelic_rpm-3.9.6.257/lib/new_relic/agent/instrumentation/httpclient.rb:34 in "block in do_get_block_with_newrelic"
/gems/newrelic_rpm-3.9.6.257/lib/new_relic/agent/cross_app_tracing.rb:45 in "tl_trace_http_request"
/gems/newrelic_rpm-3.9.6.257/lib/new_relic/agent/instrumentation/httpclient.rb:33 in "do_get_block_with_newrelic"
/gems/httpclient-2.4.0/lib/httpclient.rb:891 in "block in do_request"
/gems/httpclient-2.4.0/lib/httpclient.rb:985 in "protect_keep_alive_disconnected"
/gems/httpclient-2.4.0/lib/httpclient.rb:890 in "do_request"
/gems/httpclient-2.4.0/lib/httpclient.rb:778 in "request"
/gems/rest-core-3.3.3/lib/rest-core/engine/http-client.rb:25 in "request_sync"
/gems/rest-core-3.3.3/lib/rest-core/engine/http-client.rb:17 in "request"
/gems/rest-core-3.3.3/lib/rest-core/engine.rb:11 in "block in call"
/gems/rest-core-3.3.3/lib/rest-core/promise.rb:71 in "call"
/gems/rest-core-3.3.3/lib/rest-core/promise.rb:71 in "block (2 levels) in defer"
/gems/rest-core-3.3.3/lib/rest-core/promise.rb:140 in "block in synchronized_yield"
/gems/rest-core-3.3.3/lib/rest-core/promise.rb:140 in "synchronize"
/gems/rest-core-3.3.3/lib/rest-core/promise.rb:140 in "synchronized_yield"

Any thought?

godfat commented 9 years ago

We're seeing that regularly recently, too. I am also wondering if timeout was really accurate or not... Any idea if it's accurate?

kidlab commented 9 years ago

I have no idea yet. I think we can try to upgrade to the latest HTTPClient 2.5. But the more difficult thing is how to reproduce the problem :wink:

godfat commented 9 years ago

We're always using the latest version of httpclient (so far 2.5.3.3), so I guess it didn't make much difference. I guess I'll probably need to figure out how to test this kind of error by a customized server (too lazy to try this before), so that we might be able to tell that it's really Firebase is timing out.

godfat commented 9 years ago

On the other hand, I just realized that I didn't use timers gem in timeout middleware in rest-core properly. So basically it has no effect right now. We're all relying on httpclient's own timeout mechanism right now. So I believe this should be working correctly, that Firebase indeed timed out.

However, the issue is, there's connect timeout and receive timeout. Currently, I set both of them to the timeout we specified. In this case, if we set it to 30 seconds, then it's 30 seconds for connect timeout, and 30 seconds for receive timeout.

The problem is, in a worst case, we might end up with 58 seconds execution time, because it could take 29 seconds to connect, while 29 seconds to receive data. This is a total 58 seconds. This is not ideal, that's why I still try to use timers gem to make an external timeout...

Asynchronous timeout is really a hard problem :( Especially when we have thread pool support...

kidlab commented 9 years ago

Ya, it's not easy to handle timeout actually. Hopefully, Google won't be evil after acquiring Firebase :D

godfat commented 9 years ago

Honestly I am still wondering if recent instability was because they are migrating some data centers or something :P I have no evidence and I did change some monitoring though. Maybe it's simply because errors are handled more correctly now.

Anyway, I just released rest-core 3.4.0. The significant change is that now it would no longer raise any HTTPClient::ReceiveTimeoutError, but only regular Timeout::Error, because I completely removed timeout from httpclient. (or more precisely, leave it to default, I am not sure what's the default) From now on, we'll handle timeout from our side. This should make sure we're not using double time for each request.

The precision by default is 1 second. Timeout less than 1 second won't be accurate and the real executing time might be 1 second more. Change this precision by setting RC::Timer.interval = 0.5 second or even less.

Please read the detail here: https://github.com/godfat/rest-core/blob/rest-core-3.4.0/CHANGES.md#rest-core-340----2014-11-26

godfat commented 9 years ago

I don't know if this would help, but 1.0.0 was just released and there are some new options from rest-core:

We could set max_retries to some integers greater than 0, and retry the requests for [IOError, SystemCallError, Timeout::Error] by setting :retry_exceptions. This way we could probably make the requests more reliable... though setting a timeout and retry on timeout is somehow defeating the purpose. But sometimes refreshing does really work better than waiting forever, right?

Be careful about making non-idempotent requests such as POST though! We could pass a 4th argument to disable it for a given request:

firebase.post(path, {:payload => 'data'}, {:query => 'string'}, :max_retries => 0)

We could also setup :error_callback to monitor the errors. Hope this helps.

p.s. Our application currently sets :max_retries to 3 and :retry_exceptions to [IOError, SystemCallError, Timeout::Error], with the default 10 seconds timeout. It works ok so far.

ghost commented 9 years ago

Any thoughts on an exponential backoff for timeout? For instance, 1s, 2s, 4s, 8s, etc? That way you could potentially have much quicker latencies in most cases but still safety of trying at a much later time?

godfat commented 9 years ago

I think that's a good idea for network errors and also maybe timeout, but since this was implemented in rest-core, I am not very sure how this would play with the other stuffs.

So I guess only if there's a strong interest for this kind of feature, I would simply leave this alone for now.

kidlab commented 9 years ago

@godfat Do you configure pool_size in your application? And is there any relation between the pool_size and timeout error? We currently set pool_size = 100 in our application, and it seems the problem still there. BTW, what API root endpoint are you using? We are using something like this https://xxx.firebaseio.com.

godfat commented 9 years ago

@kidlab For rest-firebase, we didn't set any pool_size, therefore a new thread would be spawned each time we make a request. Indeed the lower number of pool_size, the higher chances you would see timeout. Consider:

RestFirebase.pool_size = 1
f = RestFirebase.new(:timeout => 10)
f.get('/'){ |r| puts r }
f.get('/'){ |r| puts r }

Suppose Firebase never responds, then after 10 seconds, both would get a timeout error but only one request was made. The second request spent 10 seconds while sitting in the queue, and cancelled after timeout.

FWIW, we also see timeout from Firebase a few times each day. We use the default 10 seconds timeout. There's another exception troubles me more. This might be off topic, but have you seen this before? I am wondering if this is a network issue because I couldn't reproduce this by retrying.

RestFirebase::Error::BadRequest: [400] {"error"=>"Invalid data; couldn't parse JSON object. Are you sending a JSON object with valid key names?"}

kidlab commented 9 years ago

@godfat I haven't got exactly the same error like you so far, but I believe we encountered some kind of error saying "Keys must be non-empty strings and can't contain ".", "#", "$", "/", "[", or "]" before, when we used the Firebase javascript SDK.

godfat commented 9 years ago

Thanks. We did see that before, too. Namely we'd . in the name, and later we translated it to ^ (which I regretted, should use ~ instead, but too late...)

Thanks I'll check again for those error.