ESGF / esg-publisher

ESGF Publisher
http://esg-publisher.readthedocs.org/
9 stars 22 forks source link

About certificate-file valid #197

Closed lamaliang closed 1 year ago

lamaliang commented 2 years ago

Hello

I am a research assistant at RCEC/Academia Sinica. Due to we would like add some files for our ESGF server. But, there is a problem about certificate-file or SSL error. Please help me to solve the problem. Thanks.

There is our error message as following: Traceback (most recent call last): File "/usr/local/conda/envs/esgf-pub/bin/esgpublish", line 783, in main(sys.argv[1:]) File "/usr/local/conda/envs/esgf-pub/bin/esgpublish", line 682, in main pid_connector=pid_connector, project_config_section=project_config_section) File "/usr/local/conda/envs/esgf-pub/lib/python2.7/site-packages/esgcet/publish/publish.py", line 350, in publishDatasetList dset, statusId, state, evname, status = publishDataset(datasetName, parentIdent, service, threddsRootURL, session, schema=schema, version=versionno) File "/usr/local/conda/envs/esgf-pub/lib/python2.7/site-packages/esgcet/publish/publish.py", line 130, in publishDataset statusId = service.createDataset(parentId, threddsURL, -1, "Published") File "/usr/local/conda/envs/esgf-pub/lib/python2.7/site-packages/esgcet/publish/rest.py", line 99, in createDataset status, message = self.harvest(threddsURL, 'THREDDS', schema=schema) File "/usr/local/conda/envs/esgf-pub/lib/python2.7/site-packages/esgcet/publish/rest.py", line 209, in harvest raise ESGPublishError("Socket error: %s\nIs the proxy certificate %s valid?"%(e, self.certFile)) esgcet.exceptions.ESGPublishError: Socket error: SSLError(MaxRetryError('HTTPSConnectionPool(host=\'esgf-data.dkrz.de\', port=443): Max retries exceeded with url: /esg-search/ws/harvest?metadataRepositoryType=THREDDS&uri=http%3A%2F%2Fesgf.rcec.sinica.edu.tw%2Fthredds%2Fcatalog%2Fesgcet%2F101%2FCMIP6.PAMIP.AS-RCEC.TaiESM1.pdSST-piArcSIC.r29i1p1f1.6hrPlev.pr.gn.v20211028.xml (Caused by SSLError(SSLError("bad handshake: Error([(\'SSL routines\', \'tls_process_server_certificate\', \'certificate verify failed\')],)",),))',),) Is the proxy certificate /home/dadm/.globus/certificate-file valid?

I try to re-create the certificates-file using myproxy-login command, but it is the same.

sashakames commented 2 years ago

Hi, @lamaliang Sorry, the error message appears to be wrong. This issue won't b corrected for they Python v2.7 version. You may want to consider upgrading to Python 3 and install the new publisher. This is a significant upgrade, please see https://esg-publisher.readthedocs.io/en/latest/index.html for more information.

Evidently, it appears that your node has an invalid self-signed certificate, and that is probably contributing to the problem with publishing. I get an error when I put esgf.rcec.sinica.edu.tw into my browser. You can remedy this by either obtaining a web certificate from your institutional vendor (recommended as you can get a year or more) or use a service for a free one (more complicated as you need to renew frequently or set up a process in cron to run periodically to renew).

lamaliang commented 2 years ago

Hi @sashakames

Thanks for your help and reply. I will try it. Thanks you so much.

soay commented 2 years ago

Hi @lamaliang, probably you only need to update the truststore on your node, please follow the instructions below:

Fetch esg-truststore.ts, esg_trusted_certificates.tar, and esgf-ca-bundle.crt from

wget https://github.com/ESGF/esgf-dist/raw/master/installer/certs/esg_trusted_certificates.tar wget https://github.com/ESGF/esgf-dist/raw/master/installer/certs/esg-truststore.ts wget https://github.com/ESGF/esgf-dist/raw/master/installer/certs/esgf-ca-bundle.crt

Perform the following

cp esgf-ca-bundle.crt /etc/certs cp esg-truststore.ts /esg/config/tomcat/ chown tomcat:tomcat /esg/config/tomcat/esg-truststore.ts tar -xf esg_trusted_certificates.tar rsync -Irv --delete esg_trusted_certificates/ /etc/grid-security/certificates/ rsync -Irv --delete esg_trusted_certificates/ /esg/gridftp_root/etc/grid-security/certificates/

lamaliang commented 2 years ago

Hello

I had update the version to python 3 and modify new esg.ini file.

Then our problem is "ERROR PID module exception encountered!"

2022-08-01 18:51:49 INFO Assigning PID... 2022-08-01 18:51:49 INFO Passed source_id registration test for TaiESM1 2022-08-01 18:51:49 INFO Assigned PID to dataset CMIP6.PAMIP.AS-RCEC.TaiESM1.pdSST-piArcSIC.r29i1p1f1.6hrPlev.pr.gn.v20211028: hdl:21.14100/b398b34e-6777-312f-a6db-b821e532c73d Traceback (most recent call last): File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgcet/pid_cite_pub.py", line 108, in pid_flow_code self.check_pid_connection(send_message=True) File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgcet/pid_cite_pub.py", line 73, in check_pid_connection pid_queue_return_msg = self.pid_connector.check_pid_queue_availability(send_message=send_message) File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgfpid/connector.py", line 349, in check_pid_queue_availability return rabbit_checker.check_and_inform() File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgfpid/check.py", line 103, in check_and_inform success = self.iterate_over_all_hosts() File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgfpid/check.py", line 130, in __iterate_over_all_hosts self.connection = self.check_making_rabbit_connection() File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgfpid/check.py", line 243, in check_making_rabbit_connection connection = self.open_rabbit_connection() File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgfpid/check.py", line 275, in open_rabbit_connection connection = self.pika_blocking_connection(params) File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/esgfpid/check.py", line 279, in pika_blocking_connection return pika.BlockingConnection(params) File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 360, in init__ self._impl = self._create_connection(parameters, _impl_class) File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 451, in _create_connection raise self._reap_last_connection_workflow_error(error) pika.exceptions.AMQPConnectionError 2022-08-01 18:51:49 ERROR PID module exception encountered!

sashakames commented 2 years ago

@lamaliang You may want to check that your RabbitMQ server list is up to date. I've invited you to access the config, which we must access control due to sensitivity. Once you accept this link will work for you. https://github.com/ESGF/PID-config/blob/main/pid_creds_esg-pub-v5.txt

lamaliang commented 2 years ago

Hi @sashakames Thanks for your invitation. I had check the "pid_creds", it is the similar as my previous setting. There are some error message as following:

2022-08-02 11:47:58 ERROR PID module exception encountered!

Traceback (most recent call last): File "/home/dadm/usr/anaconda3/envs/esgf-pub/lib/python3.8/site-packages/pika/adapters/utils/connection_workflow.py", line 815, in _try_next_resolved_address addr_record = next(self._addrinfo_iter) StopIteration

During handling of the above exception, another exception occurred:

raise PIDServerException(error_message+'\nProblems:\n'+problem_message)

esgfpid.rabbit.exceptions.PIDServerException: Permanently failed to connect to RabbitMQ. Tried all hosts until a user close-down forced us to give up (e.g. the maximum waiting time was reached). Giving up. No PID requests will be sent. Problems: Server "handle-esgf-trusted.dkrz.de/esgf-pid:5671": 1x "" Server "esgf-pid-mq.ipsl.upmc.fr/esgf-pid:5671": 1x ""

I think maybe there some setting I didn't finished. Please help me to solve this problem. thank you so much

sashakames commented 2 years ago

@lamaliang this looks like a networking issue. You can test the rabbitmq servers using `openssl s_client -connect :" and use the :port pairs found in the config.

lamaliang commented 2 years ago

Hello @sashakames and @soay Sorry. the late for reply. We try to all the method from previous messages including upgrade to python3 and update certification files. the error messages are the same.

Therefore, we consider maybe the problem is our ESGF account certification. Could you help me to check our account. I can send the mail to you. Is is ok?

Could you share you mail address to me? Thank you so much.

soay commented 2 years ago

Hi @lamaliang, I'm sure everything is fine with your account. I tried to harvest the catalog to our Solr but failed with an "SSL_ERROR_UNKNOWN_CA_ALERT" so it seems to be indeed a certificate issue . On your server you still have a self-signed certificate, can you please get a proper ESGF certificate (either a "commercial cert" or a certificate signed by one of the ESGF CAs)? For the latter please create a ca request as following and send the hostcert_req.csr to Prashanth or via Slack.

$ openssl req -new -nodes -config /etc/certs/openssl.cnf -keyout /etc/esgfcerts/hostkey.pem -out /etc/esgfcerts/hostcert_req.csr -subj "/O=ESGF/OU=ESGF.ORG/CN=$esgf_host"

lamaliang commented 2 years ago

Hello @soay

Sorry the late for reply. We had had modify the certification. But, when we use python3 or python2 version for esgpublish command, there are the same error messages about retrying maximum just like our first comment in this issue.

It is so confuse for me and my colleagues. Please help us if you have free time. Thank you so much.

tkettenba commented 2 years ago

Hi,

have you already tried Sasha's suggestions? Please attempt to connect to the services via the openssl s_client command and post your findings.

I've tried this on my side and found out that from internal network machines, there is no possibility to connect to services, while from an external connected server some work, and some wont.

Here are my test results:

internal machine

All three failed --> some were manually aborted via Ctrl+C after some time)

$ openssl s_client -connect handle-esgf-trusted.dkrz.de:5671
^C

$ openssl s_client -connect 140.208.31.31:5671
^C

$ openssl s_client -connect pcmdi10.llnl.gov:5671
socket: Bad file descriptor
connect:errno=9

external machine

$ openssl s_client -connect handle-esgf-trusted.dkrz.de:5671
CONNECTED(00000003)
depth=3 C = DE, O = T-Systems Enterprise Services GmbH, OU = T-Systems Trust Center, CN = T-TeleSec GlobalRoot Class 2
verify return:1
depth=2 C = DE, O = Verein zur Foerderung eines Deutschen Forschungsnetzes e. V., OU = DFN-PKI, CN = DFN-Verein Certification Authority 2
verify return:1
depth=1 C = DE, O = Verein zur Foerderung eines Deutschen Forschungsnetzes e. V., OU = DFN-PKI, CN = DFN-Verein Global Issuing CA
verify return:1
depth=0 C = DE, ST = Hamburg, L = Hamburg, O = Deutsches Klimarechenzentrum GmbH, OU = DM, CN = handle-esgf-trusted.dkrz.de
verify return:1
---
  [... omitted ...]

$ openssl s_client -connect 140.208.31.31:5671
CONNECTED(00000003)
140326324115344:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 289 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID:
    Session-ID-ctx:
    Master-Key:
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 1661150082
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

$ openssl s_client -connect pcmdi10.llnl.gov:5671
socket: Bad file descriptor
connect:errno=9
lamaliang commented 2 years ago

Hello @tkettenbach-dwd

Yes, I had followed the method from Sasha's suggestions. My openssl command looks like good when I try to link handle-esgf-trusted.dkrz.de:5671 and 140.208.31.31. esgf-pid-mq.ipsl.upmc.fr and aims4.llnl.gov are not available.

Therefore, I modify to pid_creds in esg.ini file only try to link dkrz server and 140.208.31.31 server. The error messages are as following:

2022-08-22 23:32:35 ERROR PID module exception encountered! esgfpid.rabbit.exceptions.PIDServerException: Permanently failed to connect to RabbitMQ. Tried all hosts until a user close-down forced us to give up (e.g. the maximum waiting time was reached). Giving up. No PID requests will be sent. Problems: Server "handle-esgf-trusted.dkrz.de/esgf-pid:5671": 1x ""

lamaliang commented 2 years ago

I think I need to ask our admin to rabbitmq-server. From the error messages, maybe our server didn't open the RabbitMQ.

BTW, if we didn't start RabbitMQ, why we can upload the data from esgpublish v3.7 from python 2 version in the past.

lamaliang commented 2 years ago

Hello everyone,

we had used more methods to solved this problem, but there are the same error messages. Could you help me to fixed this problem? Thanks you so much.

Socket error: %s\nIs the proxy certificate %s valid?"%(e, self.certFile)) esgcet.exceptions.ESGPublishError: Socket error: SSLError(MaxRetryError('HTTPSConnectionPool(host=\'esgf-data.dkrz.de\', port=443): Max retries exceeded with url: /esg-search/ws/harvest?metadataRepositoryType=THREDDS&uri=http%3A%2F%2Fesgf.rcec.sinica.edu.tw%2Fthredds%2Fcatalog%2Fesgcet%2F101%2FCMIP6.PAMIP.AS-RCEC.TaiESM1.pdSST-piArcSIC.r22i1p1f1.6hrPlev.pr.gn.v20211028.xml (Caused by SSLError(SSLError("bad handshake: Error([(\'SSL routines\', \'tls_process_server_certificate\', \'certificate verify failed\')],)",),))',),) Is the proxy certificate /home/dadm/.globus/certificate-file valid?

tkettenba commented 2 years ago

The original problem as mentioned here seems to still exist. The issue is not the 'proxy certificate', but the server certificate on your side. Try to browse the URL: https://esgf.rcec.sinica.edu.tw and obtain the certificate via a regular internet browser. Most modern browsers are mentioning an 'insecure connection', as you're using a self-signed certicate.

As mentioned earlier you have basically two options:

Here is one example from our side: https://esgf-chat.slack.com/archives/C9UGMHP3K/p1636536673000500

BR, Thomas

lamaliang commented 2 years ago

Hello Thomas @tkettenbach-dwd Sorry, the late for reply. Please let me join your slack group and channel. mail : lama@gate.sinica.edu.tw My OpenID is : https://esgf-node.llnl.gov/esgf-idp/openid/lama_liang

BTW, I can not see the example due to I didn't joined.

blcc commented 1 year ago

Hi, I took this issue from lamaliang.

The certificate is fixed with Let's Encrypt (for now). But seems it is not the problem. I print the error from blocking_connection.py (with python3, esgcet==5.1.0b11):

Connection workflow failed: AMQPConnectionWorkflowFailed: 2 exceptions in all; last exception - AMQPConnectorSocketConnectError: OSError(101, 'Network is unreachable'); first exception - AMQPConnectorSocketConnectError: timeout("TCP connection attempt timed out: 'handle-esgf-trusted.dkrz.de'/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('136.172.11.46', 5671))")

However openssl is able to connect:

openssl s_client -connect handle-esgf-trusted.dkrz.de:5671
CONNECTED(00000003)
depth=2 C = US, ST = New Jersey, L = Jersey City, O = The USERTRUST Network, CN = USERTrust RSA Certification Authority
verify return:1
depth=1 C = NL, O = GEANT Vereniging, CN = GEANT OV RSA CA 4
verify return:1
depth=0 C = DE, ST = Hamburg, O = Deutsches Klimarechenzentrum GmbH, CN = handle-esgf-trusted.dkrz.de
verify return:1
---
Certificate chain
(skip)

Since this server has no ipv6 gateway, I guess somehow it raise error when trying to connect via ipv6. But not so sure. Any ideas?

Thanks in advance, Ping-Gin Chiu

tkettenba commented 1 year ago

Hi, to me it looks that the python version is choosing the IPv4 Address. This should be correct and it should work, imho.

Can you please repeat the openssl command from the same environment as the python command? (I'm just wild guessing here, that the environments were different) Additionally, please attempt the openssl command using the IPv4 address directly, e.g.

openssl s_client -connect 136.172.11.46:5671

As a simple check, you could also try to connect directly to the AMQP service using curl or netcat.

curl --head https://handle-esgf-trusted.dkrz.de:5671

or netcat:

nc handle-esgf-trusted.dkrz.de 5671    

BR, Thomas Kettenbach

lamaliang commented 1 year ago

Hello Everyone

I think we had solved this problem. Thanks for everyone. Maybe we can close this issue.

blcc commented 1 year ago

Hi all, I did not really know where the exactly problem is. For reference, here is how it solved.

  1. Overwrite ESGF node by reinstall with ESGF-ansible. Some error occurs, try to fix them util thredds server works.
  2. Update truststore as @soay 's first suggestion. Then it works. I guess there were some unrevealed errors in this node, which caused the first error. Anyway, problem solved. Thanks.