epermana / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
1 stars 0 forks source link

SSL support is random and inconsistent #727

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Install a standard master/slave configuration: 

./tools/tpm install alpha    --topology=master-slave     --master=tr-ssl1     
--replication-user=tungsten     --replication-password=password     
--home-directory=/opt/continuent     --members=tr-ssl1,tr-ssl2,tr-ssl3      
--start

2. Check replication is working

3. Create your keystore/truststore using keytool or by importing openssl keys 
into the stores

4. Update your configuration to use SSL: 

./tools/tpm update     --thl-ssl=true     
--java-keystore-path=/home/tungsten/keystore.jks     
--java-keystore-password=password     
--java-truststore-path=/home/tungsten/truststore.ts     
--java-truststore-password=password 

5. Check cluster status, slaves will be showing: 

state                  : GOING-ONLINE:SYNCHRONIZING

Logs show: 

INFO   | jvm 1    | 2013/10/07 16:14:37 | 2013-10-07 16:14:37,665 [alpha - 
remote-to-thl-0] INFO  thl.RemoteTHLExtractor Waiting for master to become 
available: uri=thls://tr-ssl1:2112/ attempts=70 timeouts=0

6. Now stop replicator on the master and the slave(s): 

replicator stop
replicator stop

Wait for a couple of minutes

7. Now start replicator on the master

replicator start

8. Now start replicator on the slave

replicator start

9. Check status again

Replication with SSL will generally be working.

What is the expected output?

SSL based replication should work every single time a correct certificate is 
given to the replicator.  

What do you see instead?

Failed replication.

What is the possible cause?

Somewhere a port, certificate, or some king of active connection or data is 
remaining open or not flushed during a reconfigure or (short) stop/start.  

This isn't *always* the case, but I think it points to a potential issue with 
the certificate/SSL initialization that there is a timing/caching problem, or 
mis-application of the certificate. 

Interstingly, once running *once* after the replicator stop/start, all future 
certificate updates after that will work fine. It's therefore also possible 
this is a problem with initializing the SSL connection. This might especially 
be the case given that the trepsvc.log on the slave reports an error as if the 
master were not contactable

What is the proposed solution?

SSL needs to be more robust, and provide better error messages. 

Ideally, the info in trepsvc.log should indicate whether the SSL connection 
failed because of a bad certificate, or some other error. At the mome

Additional information

The above sequence works with both OpenSSL (self-signed) generated 
certificates, and with keytool online certificates. 

The current setup makes replicator unstable and unpredictable with SSL enabled. 

Use labels and text to provide additional information.

Original issue reported on code.google.com by mc.br...@continuent.com on 7 Oct 2013 at 3:51

GoogleCodeExporter commented 9 years ago
Robert will take a look at this with Jeff and MC, but later Stephane might need 
to finish the effort.

Original comment by linas.vi...@continuent.com on 7 Oct 2013 at 4:30

GoogleCodeExporter commented 9 years ago
Further testing indicates this might be a slave connection, rather than server 
connection issue. 

Performing a tpm update with new keystore/truststore gives me: 

Master: running
Slave1: Unable to connect
Slave2: Connecting fine (And accepting heartbeats)

Slave 1 shows the 'Waiting for master...':

INFO   | jvm 1    | 2013/10/07 18:03:29 | 2013-10-07 18:03:29,123 [alpha - 
remote-to-thl-0] INFO  thl.RemoteTHLExtractor Waiting for master to become 
available: uri=thls://tr-ssl1:2112/ attempts=10 timeouts=0

Checking the native SSL with the appropriate certificates on Slave1 by 
connecting to the master:

openssl s_client -connect 192.168.2.40:2112  -cert client-cert.pem -key 
client-key.pem -CAfile ca-cert.pem -ssl3
CONNECTED(00000003)
depth=0 C = UK, ST = Lincs, L = Barrowby, O = Continuent, OU = IT, CN = MC, 
emailAddress = mc@mcslp.com
verify error:num=18:self signed certificate
verify return:1
depth=0 C = UK, ST = Lincs, L = Barrowby, O = Continuent, OU = IT, CN = MC, 
emailAddress = mc@mcslp.com
verify return:1
---
Certificate chain
 0 s:/C=UK/ST=Lincs/L=Barrowby/O=Continuent/OU=IT/CN=MC/emailAddress=mc@mcslp.com
   i:/C=UK/ST=Lincs/L=Barrowby/O=Continuent/OU=IT/CN=MC/emailAddress=mc@mcslp.com
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDbDCCAlQCAQEwDQYJKoZIhvcNAQEFBQAwfDELMAkGA1UEBhMCVUsxDjAMBgNV
BAgMBUxpbmNzMREwDwYDVQQHDAhCYXJyb3dieTETMBEGA1UECgwKQ29udGludWVu
dDELMAkGA1UECwwCSVQxCzAJBgNVBAMMAk1DMRswGQYJKoZIhvcNAQkBFgxtY0Bt
Y3NscC5jb20wHhcNMTMxMDA3MTMwMTMzWhcNMjMwODE2MTMwMTMzWjB8MQswCQYD
VQQGEwJVSzEOMAwGA1UECAwFTGluY3MxETAPBgNVBAcMCEJhcnJvd2J5MRMwEQYD
VQQKDApDb250aW51ZW50MQswCQYDVQQLDAJJVDELMAkGA1UEAwwCTUMxGzAZBgkq
hkiG9w0BCQEWDG1jQG1jc2xwLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBALbpckvTvP8ohTEL6/qtAOK4H8q7/k52pJdsVTgHuy0tFY+u86i5/2V4
aeA+0qeVgOltsV0wQQ/HAaSs4IeUvvEzqfq16G4nrl/J6bDnqPSH40MlBJBHHFA3
Min19icc0+XOEYxgqz+AiJuVO8Jwelk48szofKnwqvRtHAsJi2XTLMXDDsNIhdWu
ok4vp35/YZ0jBfQTeXm53Ylb8+Hyu+kDAt5CsuIP50nwUV2Zkr0svtN3JtkW03UE
g4s3o1HN0T6fXtoB6s3w6afPbUxuf3Cg5/CV64MXPOqK8ZoFF4AWwiHarYh27ovO
QNogYYtaDL4LystphhRFRQbrptNxZwUCAwEAATANBgkqhkiG9w0BAQUFAAOCAQEA
MSjvH118e8N0IIoJrB0txuB/OKN55F0vR054JNBUTwQV0f3+dVj/EjNvS0y9mwlF
ZjDv2B+OLH5XXhn7B1aJoHLs9R9aR0IXdsQ5dWN9+gl2prm8gbMuXFBCAnA5y9dX
zKMjbbt1Q/svF8Qa0r7LMHH5mjg4rtq1aFCyPJ8cYX++pCYqJ2psJ0u+ojVk3c8/
/E8Q7BCdip90Vp+zi2KfPOKcZ0VHLpvsYkPcfkA00K5ggIJg2Vv4zivI9Wo37Og5
kDktNanV5blZeeg/4HR9hQ2p63vQFWLkVHjVZVoYjgsY0zopVeeM8TSgGXe8UqN2
3pjrlutcpymAHfYrepAXGA==
-----END CERTIFICATE-----
subject=/C=UK/ST=Lincs/L=Barrowby/O=Continuent/OU=IT/CN=MC/emailAddress=mc@mcslp
.com
issuer=/C=UK/ST=Lincs/L=Barrowby/O=Continuent/OU=IT/CN=MC/emailAddress=mc@mcslp.
com
---
No client certificate CA names sent
---
SSL handshake has read 1386 bytes and written 301 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : SSLv3
    Cipher    : ECDHE-RSA-AES256-SHA
    Session-ID: 5252E995A4C1BEB381FE0B5E856EAB57BB3F23143428BF732BA17E25F394B6FB
    Session-ID-ctx: 
    Master-Key: CC356D897569B3C64069834AB1E99EDFA78599676366E7CF578453C3C5C3D8E6019557F5F35FBF7BC9F3B3D14D705A03
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1381165461
    Timeout   : 7200 (sec)
    Verify return code: 18 (self signed certificate)
---
???L8com.continuent.tungsten.replicator.thl.ProtocolHandshake
    capabilitiestLjava/util/Map;xr6com.continuent.tungsten.replicator.thl.ProtocolMessage6???LpayloadtLjava/io/Serializable;xppsrjava.util.HashMap???`?F
loadFactorI thresholdxp?@
                             t  min_seqnot0t    source_idttr-ssl1troletmastert  max_seqnot27tversiont"Tungsten Replicator 2.2.0 build 69x

---

THL Listener on Slave1 is working with SSL

A stop/start of the replicator does not fix the issue. 

Keystores on Slave1: 

fa1bac3ffa2569ffd55dc265a0efc3cb  tungsten_keystore.jks
6bea38caafd8c6ff187354619e80e4ae  tungsten_truststore.ts

On Slave2:

fa1bac3ffa2569ffd55dc265a0efc3cb  tungsten_keystore.jks
6bea38caafd8c6ff187354619e80e4ae  tungsten_truststore.ts

On Master: 

fa1bac3ffa2569ffd55dc265a0efc3cb  tungsten_keystore.jks
6bea38caafd8c6ff187354619e80e4ae  tungsten_truststore.ts

Original comment by mc.br...@continuent.com on 7 Oct 2013 at 5:13

GoogleCodeExporter commented 9 years ago
This needs to be done right. We might not be able to fix it for 2.2.0.

Original comment by linas.vi...@continuent.com on 15 Oct 2013 at 4:49

GoogleCodeExporter commented 9 years ago
MC will reproduce (it's not constant) and generate full log then look into this 
with Robert together. Maybe there's a race condition underneath.

Gilles has an SSL engine class, which allows to have both SSL and non-SSL 
connections on the same port and recognize that from the header - Robert did 
not use this, but just might here.

Original comment by linas.vi...@continuent.com on 17 Oct 2013 at 2:29

GoogleCodeExporter commented 9 years ago
Full logs from an install + update which reproduces the issue

Original comment by mc.br...@continuent.com on 17 Oct 2013 at 2:41

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by robert.h...@continuent.com on 11 Dec 2013 at 4:16

GoogleCodeExporter commented 9 years ago

Original comment by jeff.m...@continuent.com on 27 Mar 2014 at 9:05

GoogleCodeExporter commented 9 years ago
Moving to 3.1.0.  This has not been reliably reproduced to date. 

Original comment by robert.h...@continuent.com on 19 Dec 2014 at 2:17

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 19 Jan 2015 at 2:20