basho / riak-nodejs-client

The Riak client for Node.js.
Apache License 2.0
72 stars 28 forks source link

Error: Connection closed while executing command when doing secondaryIndexQueryAsync [JIRA: CLIENTS-1093] #187

Open mogadanez opened 7 years ago

mogadanez commented 7 years ago
lukebakken commented 7 years ago

I have timeout 20 sec

How did you set this timeout?

this queries stale for this timeout and not try to recover according to [executionAttempts=3] setting

I don't understand this statement. Are you saying that the query did re-try in your scenario, but you would expect it not to re-try?

Error: Connection closed while executing command

This is caused by Riak closing the connection, or a proxy between your application closing the connection.

mogadanez commented 7 years ago

How did you set this timeout? I was wrong, this timeout used only for SOLR request which we currently done with plain http Node constructed and added to cluster


makeNode(){
var nodeBuilder = new Riak.Node.Builder().withRemoteAddress(host).withRemotePort(riakCfg.pbPort);
}

new Riak.Client( new Riak.Cluster({nodes: currentNodesList.map(x=> makeNode( x ) )})

so default timeouts is used I assume

> I don't understand this statement. Are you saying that the query did re-try in your scenario, but you would expect it not to re-try?

No, it is _not_ retry, but I expect that it will
I'm listening on  node event, and log retries

node.on("retryCommand", (command, node)=>{ makeLogger( ).warn( {command: _.pick( command, "PbRequestName", "remainingTries", "name" ) , bucket: command.options.bucket , indexName: command.optionsindexName , bucketType: command.options.bucketType , key: String ( command.options.key ) , riakNode: formatRiakNode(node) }, "Command failed " + (command.options.indexName ? "(" + command.options.indexName + ")" : "") + ", Retry (" + command.remainingTries + " remaining)" ) }


but for this command it is not logged.

>Are you using a proxy, like HAProxy?
How often do you see this error?

not using proxy. often occurs more In peak time, in total about ~300-400 per day.
lukebakken commented 7 years ago

@mogadanez thanks for all of your information. I don't know why the command is not re-trying in your situation. I will keep looking into that. If you could log command.remainingTries in your error handler that would be helpful.

What is happening is that, at peak time, the additional cluster load causes secondary index queries to exceed the client-side request timeout of 5 seconds (timeout is set here).

You can see this in your stack trace (I didn't notice it the first time I looked):

    at RiakConnection._connHandleTimeout (/var/proofme/node_modules/basho-riak-client/lib/core/riakconnection.js:125:14)
    at Socket.g (events.js:291:16)
    at emitNone (events.js:86:13)
    at Socket.emit (events.js:185:7)
    at Socket._onTimeout (net.js:339:8)

This means that the command took longer than the default of 5 seconds to execute. The internal timeout in Riak for a secondary index query is 60 seconds, so you could use the requestTimeout option to set the request timeout to at least 60 seconds.

Note that change affects every request to Riak. Another option would be to increase requestTimeout to a larger value, like 30 seconds, on every node, and set the timeout option on the SecondaryIndexQuery command to 29750 (docs).

This means that Riak will spend a maximum of 29750 milliseconds executing the secondary index query before timing out and sending a timeout response back to your application. Having this value slightly less than the requestTimeout makes it more likely the command won't error due to a client-side timeout.

You will have to experiment to find the best requestTimeout value to allow secondary index queries to run to completion during peak time.