SimplyStaking / panic_cosmos

🚨 PANIC for Cosmos
GNU General Public License v3.0
47 stars 36 forks source link

HTTPS with self signed certificates #50

Open yn-alex opened 3 years ago

yn-alex commented 3 years ago

Hello,

I tried to connect the PANIC to a RPC that is behind an NGINX server with SSL enabled and a self signed certificate.

Setup: 3 servers PANIC ------(HTTPS)----NGINX:443----(HTTP)-----COSMOS-RPC:26657

Error:

Sep 26 23:51:44 pipenv[23856]: Trying to connect to https://xxxxxxxxxxxxxxxxx:8443/node1/status
Sep 26 23:51:44 pipenv[23856]: Failed to connect to cosmos-node at https://xxxxxxxxxxxxxxxxx:8443/node1
Sep 26 23:51:44 pipenv[23856]: PANIC MAJOR - Node cosmos-node was not accessible during PANIC startup. cosmos-node will NOT be monitored until it is accessible and PANIC restarted afterwards. Some features of PANIC might be affected.

The CA certificate is installed on the machine where PANIC is run. A curl https://xxxxxxxxxxxxxxxxx:8443/node1/status works fine, no errors prompted.

Then I simply swapped https to http above, same URL, same setup, just HTTP instead of HTTPS and, of course different pot (80) and it worked.

Therefore I guess it should be the self-signed certificate to blame.

Any sugestions ?

migueldingli1997 commented 3 years ago

Hi @yn-alex. Acknowledged. Will investigate this and get back to you.

migueldingli1997 commented 3 years ago

@yn-alex It might be the case that PANIC does not trust self-signed SSL certificates. Could you git fetch and git checkout miguel/50-self-signed-https and try to run PANIC again please?

I've done two things in that branch (can see here):

Would be great if you try this out and let me know if it resolves your problem. Feel free to also post the errors you get; these would be very helpful for me to better understand the problem.

yn-alex commented 3 years ago

Hi @migueldingli1997 - Will check ASAP and get back to you. Thank you !

easy2stake commented 3 years ago

Hi,

It works with that branch, this is the Warning I see now, which is expected because I'm using a wildcard self-generated certificate:

SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for rpc.domain.com has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)

Additionally, it now supports a cert.pem for the Cosmos node requests - I didn't have to copy this. I simply cloned the branch, copied my config and started it.

migueldingli1997 commented 3 years ago

That's great. I will work on a more permanent fix and get back to you. Feel free to continue using that branch for the meantime.

migueldingli1997 commented 3 years ago

@yn-alex @easy2stake Oh I think I misunderstood. You did not need to copy the cert.pem for PANIC to work? That's strange because if PANIC does not find the cert.pem, it reverts to the old functionality. So it should have resulted in the same errors if you did not copy this into the PANIC directory.

Could you try using the other PANIC instance (i.e. not this branch) and checking if this now works?

yn-alex commented 3 years ago

Hi,

Sorry fort the late reply, meanwhile I worked around the issue but I'd like to help you close this. I double checked and the cert.pem must be there. This is the behaviour:

  1. With no cert.pem
SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))
  1. With cert.pem
    19/11/2020 12:28:38 AM - general - INFO - Trying to connect to https://xxxxxxxxxxxxx:8443/status
    19/11/2020 12:28:38 AM - general - INFO - Success.

    But 20 seconds later:

    
    19/11/2020 12:28:48 AM - general - INFO - Sent telegram alert.
    19/11/2020 12:28:59 AM - general - INFO - Sent telegram alert.

The alerts: PANIC INFO: Experiencing delays when trying to access xxxx. PANIC MAJOR: I cannot access xxxx. Node became inaccessible at 2020-11-19 00:28:38.746902 and has been inaccessible for (at most) 0h, 0m, 20s.


So it basically goes past the first start yet later it cannot access the node. Maybe another function calls the RPC and needs the verify="cert.pem"  ?