Closed bjzhangl closed 3 years ago
Thanks for the detailed report, we'll look into it.
@ricellis Thanks! I also made a simple fix by changing the following code in client.js
request.response
.on('response', function(response) {
request.response.pause();
utils.runHooks('onResponse', request, response, function() {
utils.processState(request, done); // process response hook results
});
});
to be
request.response
.on('response', function(response) {
request.response.pause();
utils.runHooks('onResponse', request, response, function() {
//utils.processState(request, done); // process response hook results
utils.processState(request, function(stop) {
return;
});
});
});
and add the following line after https://github.com/cloudant/nodejs-cloudant/blob/263dddaa07f1c50d8cc1fe341646cfbea574d87c/lib/clientutils.js#L167
done(new Error('No retry requested'));
I am not sure if my fix is enough and reasonable enough. This issue is urgent for us. It's very appreciate if you can give a conclusion and provide a fix ASAP.
Initially I wasn't able to reproduce this locally, but then I remembered from the previous issue that you use the retry
plugin.
I was able to reproduce it locally when using the retry
plugin, but the timeout also must occur during the data phase of the response. If the timeout occurs before that (e.g. during the connect/headers) then the problem is not exposed and the retries execute as expected.
I think this is part of the reason we weren't able to detect it before - standard use of the nock
mock server used in the tests does not appear to be able to reproduce the issue. It has three options for mocking a timeout error: replyWithError
and delayConnection
both occur before the response
event so don't expose this issue and while delayBody
does seemingly happen in the correct phase, when using that option the nock
library doesn't appear to surface timeouts from the overridden request framework.
@ricellis yes. This is due to retry plugin. Please give a fix time if you have. Thanks. Does the new cloudant client can fix this issue?
This is due to retry plugin.
Right. It only happens when all 3 of these conditions are satisfied:
error
occurs after the response
has startedIt is possible to avoid the problem by using the "listener" style or removing the retry
plugin.
Please give a fix time if you have
We're working on it, it's complicated by the mix of callback and listener styles. Your proposal works ok for stopping the hang you observe, but it appears to introduce a different one for some other flows.
Does the new cloudant client can fix this issue?
Certainly this specific problem won't exist in the new client because the retry
plugin doesn't exist there. When retry functionality is added there (via the node-sdk-core) it will be a different implementation.
@ricellis Thanks for your quick fix. When will you have a new release to include this fix? We use npm to install the package.
It will be in the next release, we hope to do that soon, we're just working on one other change for inclusion in the next release.
This is relased in https://github.com/cloudant/nodejs-cloudant/releases/tag/4.4.1
We use nodejs-cloudant 4.2.2. And this can be reproduced on 4.4.0. Cloudant met DoS attack some days ago, and we found nodejs-cloudant hangs during this time in our production env. We also reproduced this by using following https server instead of cloudant server.
In reproduce situation, our timeout is 30000ms. In production env, we do not set timeout. I debug the nodejs-cloudant in reproduce case with timeout. And found in https://github.com/cloudant/nodejs-cloudant/blob/263dddaa07f1c50d8cc1fe341646cfbea574d87c/lib/client.js#L230:
The parameter of done callback in utils.processState is "new Error('No retry requested')" which causes async.forever exits. When timeout abort emit, nodejs-cloudant run the code section from the line "https://github.com/cloudant/nodejs-cloudant/blob/263dddaa07f1c50d8cc1fe341646cfbea574d87c/lib/clientutils.js#L158" to try to retry the request. Since async.forever has exit, nodejs-cloudant hangs there.
It seems "https://github.com/cloudant/nodejs-cloudant/blob/263dddaa07f1c50d8cc1fe341646cfbea574d87c/lib/clientutils.js#L167" and "https://github.com/cloudant/nodejs-cloudant/blob/263dddaa07f1c50d8cc1fe341646cfbea574d87c/lib/clientutils.js#L163" should be the best place to exit async.forever. Exit aync.forever in reponse hooks may cause problem when response is emit by request package but network problem happend.
Because my reproduce is not based on the real cloudant send response scenario, and i am not very familiar with request package, my analysis may be not right. But please check if there is some problem in nodejs-cloudant even if we add timeout in our code explicitly. We had met such hang issues in our production env 3 times in last and this month. And this also cause our service can not restore even if cloudant server restores. We must restart our service manually to make it work.