basho / riak-nodejs-client

The Riak client for Node.js.
Apache License 2.0
72 stars 29 forks source link

Connect using authentication - Timeout in TLS [JIRA: CLIENTS-666] #112

Open hytvi opened 8 years ago

hytvi commented 8 years ago

So I was doing my final testing before putting my application in production, and now authenticated connections don't seem to work anymore.

-- /etc/var/riak/error.log

2015-11-23 10:06:41.048 [error] <0.31692.39> gen_fsm <0.31692.39> in state wait_for_tls terminated with reason: {error,{startls_failed,closed}}
2015-11-23 10:06:41.048 [error] <0.31692.39> CRASH REPORT Process <0.31692.39> with 0 neighbours exited with reason: {error,{startls_failed,closed}} in gen_fsm:terminate/7 line 622
2015-11-23 10:06:41.049 [error] <0.330.0> Supervisor riak_api_pb_sup had child undefined started with {riak_api_pb_server,start_link,undefined} at <0.31692.39> exit with reason {error,{startls_failed,closed}} in context child_terminated`

-- connection code

var nodes = [];
hosts.forEach(function (host) {
     var hostPort = host.split(':');

      nodes.push(new Riak.Node({
           remoteAddress: hostPort[0],
            remotePort: hostPort[1],
            auth: {
                user: options.auth.user,
                password: options.auth.password
            },
            connectionTimeout: 1000,
            cork: true
        }));
});

riakClient = new Riak.Client(new Riak.Cluster({nodes: nodes}));

riakClient.ping(function (err) {
            if (err) {
                callback(new Error(err));
            }
            else {
                callback(undefined);
            }
        });
}

--- Application output

error: [RiakConnection] Failed to connect: *ip censored* port: 8087 error: RiakConnection Timed out trying to connect
error: [RiakConnection] Failed to connect: *ip censored* port: 8087 error: RiakConnection Timed out trying to connect
error: [RiakConnection] Failed to connect:  *ip censored* port: 8087 error: RiakConnection Timed out trying to connect
error: [RiakConnection] Failed to connect:  *ip censored* port: 8087 error: RiakConnection Timed out trying to connect
✖ testAuthenticatedConnection

Assertion Message: Could not connect to Riak: Error: No RiakNodes available to execute command.
AssertionError: Could not connect to Riak: Error: No RiakNodes available to execute command.
    at Object.equal (/usr/local/lib/node_modules/nodeunit/lib/types.js:83:39)
    at /*sensored (my_app_dir)*/test/connection.js:39:14
    at Ping.callback (*sensored (my_app_dir)*/lib/RiakHelper/RiakHelper.js:65:21)
    at Ping.CommandBase._callback (*sensored (my_app_dir)*/node_modules/basho-riak-client/lib/commands/commandbase.js:74:19)
    at Ping.CommandBase.onError (*sensored (my_app_dir)*/node_modules/basho-riak-client/lib/commands/commandbase.js:168:10)
    at RiakCluster.execute *sensored (my_app_dir)*/node_modules/basho-riak-client/lib/core/riakcluster.js:212:25)
    at RiakCluster._onRetryCommand (*sensored (my_app_dir)*node_modules/basho-riak-client/lib/core/riakcluster.js:305:10)
    at emitTwo (events.js:87:13)
    at RiakNode.emit (events.js:172:7)
    at /*sensored (my_app_dir)*/node_modules/basho-riak-client/lib/core/riaknode.js:227:30

/usr/local/lib/node_modules/nodeunit/lib/core.js:285
    if (group.setUp) {
             ^

TypeError: Cannot read property 'setUp' of undefined
    at wrapGroup (/usr/local/lib/node_modules/nodeunit/lib/core.js:285:14)
    at Object.exports.runSuite (/usr/local/lib/node_modules/nodeunit/lib/core.js:93:13)
    at /usr/local/lib/node_modules/nodeunit/lib/core.js:125:21
    at /usr/local/lib/node_modules/nodeunit/deps/async.js:513:13
    at iterate (/usr/local/lib/node_modules/nodeunit/deps/async.js:123:13)
    at /usr/local/lib/node_modules/nodeunit/deps/async.js:134:25
    at /usr/local/lib/node_modules/nodeunit/deps/async.js:515:17
    at Immediate._onImmediate (/usr/local/lib/node_modules/nodeunit/lib/types.js:146:17)
    at processImmediate [as _immediateCallback] (timers.js:383:17)

Error: No RiakNodes available to execute command

It might be related to issue #104

lukebakken commented 8 years ago

Hello -

My first question - since it previously was working, what changed? The Node.js client code didn't change so it must have been something in your environment.

Are you running in a different environment from when it was working?

hytvi commented 8 years ago

Hi Luke,

Yes we have updated one thing in our setup, but I can't see how this change can afflect on this error.

We changed in our Riak configuration (as it was kinda slow (100ms+ per request)) the setting check_crl = off.

Yes I changed the environment, but both don't seem to work now. Cluster creation works fine, it goes wrong on the ping..

lukebakken commented 8 years ago

Just to be sure, you changed check_crl = on to check_crl = off ?

How are you creating your server certificate for TLS? Are you using your own certificate authority?

hytvi commented 8 years ago

We changed check_crl to off.

We are using our own CA, indeed.

lukebakken commented 8 years ago

OK I don't see anything in your configuration to tell node.js to use the Root CA as a trusted certificate:

https://github.com/basho/riak-nodejs-client/blob/master/test/security/security.js#L40

This is necessary for username/password auth since Node.js has to be able to validate the server certificate.

Also, please review the security configuration for Riak to make sure you have your server certificate created correctly and installed correctly:

http://docs.basho.com/riak/latest/ops/running/authz/#Enabling-SSL

For instance, each server certificate's CN= value must match that server's resolvable host name, and the client must use host names (not ip addresses).

hytvi commented 8 years ago

Hi Luke,

I did not add the CA as I thought that it was not neccesary. Anyway, now it's added to my configuration. Still does not seem to work however.

Our Riak nodes do not have a resolvable hostname, only IP addresses. Are you saying that in order to make Riak security work, we will need to assign a hostname to every of our riak servers?

lukebakken commented 8 years ago

Are you saying that in order to make Riak security work, we will need to assign a hostname to every of our riak servers?

This is not unique to Riak security, it is how TLS/SSL works to validate that a server certificate is valid when presented to a client during the handshake process.

I believe you can generate SSL certificates for your servers using their IP address in the CN= section of the certificate, and that might work. You can use host names + the hosts file for DNS resolution as another option.

How did you have your previous environment set up? Without having TLS/SSL set up correctly Riak security should not have worked.

hytvi commented 8 years ago

Hi Luke,

I am aware of the use of CN in SSL, to prevent man-in-the-middle-attacks. The point is, that usually there is an option available to accept untrusted certificates, e.g. when you are not interested in mitm-attack prevention, but you want to use another advantage of ssl. In my case I am interested in encrypted transfer + authorized access to our riak nodes.

Weirdly enough, riakpbc works out of the box with SSL, maybe it did not implement the mitm-attack prevention.

Is there an option available where I can allow this certificate to test whether this is the problem?

lukebakken commented 8 years ago

https://github.com/basho/riak-nodejs-client/blob/master/lib/core/riakconnection.js#L182

Add two options to tls_socket_options:

        var tls_socket_options = {
            isServer: false, // NB: required
            secureContext: tls_secure_context,
            rejectUnauthorized: false,
            checkServerIdentity: function (s, c) { return undefined; }
        };

Riak will still have to have SSL configured (server certificate, private key, and root CA cert), but you should be able to use "incorrect" certificates. The above changes should override the usual TLS/SSL server certificate validation.

faust64 commented 7 years ago

Lately, I enabled the security mode on some test setups, and since then I'm often hitting with connection timeouts, and processes failing to start in a loop.

Couple weeks ago, I had to raise riak connectionTimeout to 9000 ms, so I could restart my processes - on a setup that was working. Initial deployment: 42 days ago, without a scratch. Today, deploying a new revision, even with my 9 seconds timeout, all my processes refuse to start, throwing about some riak timeout, ... (pm2 restarts it, ... airbrake sends stack traces to my errbit, in a loop, ... which eventually floods some slack channel of mine). I tried raising the timeout to 12, 15, 20, 27 seconds, ... Finally, with 30, all my processes were able to start.

Shocked: I ended up restarting riak on all my nodes. Turns out, I was then able to drop that connectionTimeout configuration, and restart my processes, without any error. And no, I didn't think of generating riak debugs, before restarting riak.

FYI: crl check off, using 4K RSA certificates & certificate-based client authentication.

lukebakken commented 7 years ago

@faust64 could you please open a separate issue? If I understand your description, it may be due to performance issues with SSL and Erlang.