DUNE / dist-comp

Action items for DUNE distributed computing, and common scripts that are used.
2 stars 0 forks source link

All AWT is red as of 09:31 Fermilab time--Bad proxy #131

Closed StevenCTimm closed 4 months ago

StevenCTimm commented 4 months ago

AWT jobs all erroring out with message like this:

ES_CIEMAT SURFSARA davs root://penguin12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt Run: [FATAL] Auth failed: No protocols left to try (source) 'xrdcp --force --nopbar --verbose root://penguin12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt downloaded.txt' returns 52 GFAL_CONFIG_DIR: GFAL_PLUGIN_DIR: justin-rucio-upload attempt 1 DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443 2024-02-27 02:33:59,739 ERROR ConnectionError: HTTPSConnectionPool(host='dune-rucio.fnal.gov', port=443): Max retries exceeded with url: /auth/x509_proxy (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_EXPIRED] sslv3 alert certificate expired (_ssl.c:1129)'))) ERROR:baseclient:ConnectionError: HTTPSConnectionPool(host='dune-rucio.fnal.gov', port=443): Max retries exceeded with url: /auth/x509_proxy (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_EXPIRED] sslv3 alert certificate expired (_ssl.c:1129)'))) DEBUG:urllib3.connectionpool:Starting new HTTPS connection (2): dune-rucio.fnal.gov:443 2024-02-27 02:34:00,288 ERROR ConnectionError: HTTPSConnectionPool(host='dune-rucio.fnal.gov', port=443): Max retries exceeded with url: /auth/x509_proxy (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_EXPIRED] sslv3 alert certificate expired (_ssl.c:1129)'))) ERROR:baseclient:ConnectionError: HTTPSConnectionPool(host='dune-rucio.fnal.gov', port=443): Max retries exceeded with url: /auth/x509_proxy (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_EXPIRED] sslv3 alert certificate expired (_ssl.c:1129)'))) DEBUG:urllib3.connectionpool:Starting new HTTPS connection (3): dune-rucio.fnal.gov:443 2024-02-27 02:34:00,724 ERROR ConnectionError: HTTPSConnectionPool(host='dune-rucio.fnal.gov', port=443): Max retries exceeded with url: /auth/x509_proxy (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_EXPIRED] sslv3 alert certificate expired (_ssl.c:1129)'))) ERROR:baseclient:ConnectionError: HTTPSConnectionPool(host='dune-rucio.fnal.gov', port=443): Max retries exceeded with url: /auth/x509_proxy (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_EXPIRED] sslv3 alert certificate expired (_ssl.c:1129)'))) justin-rucio-upload fails: Cannot connect to the Rucio server. 'justin-rucio-upload --rse SURFSARA --protocol davs --scope testpro --dataset awt-uploads-202409 awt-1708997606-ukbfuZ2ypb --timeout 1200' returns 99

subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=3206290657/CN=170899760364 issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=3206290657 identity : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=3206290657 type : RFC compliant proxy strength : 2048 bits path : /home/awt-proxy.pem timeleft : 167:59:23 key usage : Digital Signature, Key Encipherment, Key Agreement === VO dune extension information === VO : dune subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk issuer : /DC=org/DC=incommon/C=US/ST=Illinois/O=Fermi Research Alliance/CN=voms1.fnal.gov attribute : /dune/Role=Production/Capability=NULL attribute : /dune/Role=NULL/Capability=NULL timeleft : 0:00:00 uri : voms1.fnal.gov:15042

Andrew-McNab-UK commented 4 months ago

Thanks: this was my mistake in using the wrong filename last week for new certificates which replaced ones which had been revoked. Instead of overwriting the old certificates, they stayed in place. Since the old ones still had about a month to run, anything that did not check CRLs carried on working with them.

The AWT results are going green again now.