Closed Andrew-McNab-UK closed 1 year ago
Andrew, Which sites now fail under rucio 1.29 vs which sites used to work under rucio 1.26?
Filed GGUS 160778 to BNL starting with the BNL to BNL transfer
It may be that part of the reason AWT files are failing to download is that those uploads themselves are failing and the files are not there.
We have identified two big causes thus far, one or both of the voms dns not being updated, and the problem that for some cases we had to change to davs protocol from https. Neither of these explain the failures at RAL_ECHO or DUNE_CERN_EOS we continue to investigate there. The known VOMS problems and corresponding open GGUS tickets are tracked in #39 . There is one other issue which Andrew has adjusted for, namely doing one more unset of the GFAL config and library paths just before doing the rucio upload. This can be explained by the fact that the rucio v1_29 client as now in CVMFS now comes with its own gfal2 libraries and configuration settings such that Rucio upload can be run outside of singularity if needed.
Three big sites--(1) DUNE_CERN_EOS. (rucio upload is not making the directory) (2) NIKHEF likewise, (3) SURFSARA both upload and download fails.
INC3421404 filed at CERN
SURFSara ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=160690
We have individual issues open for the remaining sites, Nikhef, surfsara, and CCIN2P3 Closing this.
Several storages which were ok last year have errors mentioning GFAL in the Automated Workflow Tests.
For example, running at CERN, the rucio upload to the LANCASTER RSE fails like this:
https://landscape.fnal.gov/lens/view/job/43729668.0@jobsub03.fnal.gov/
CERN LANCASTER https root://fal-pygrid-30.lancs.ac.uk//dpm/lancs.ac.uk/home/dune/testpro/d6/e1/awt-download-2023-01-13-01.txt 'xrdcp --force --nopbar --verbose root://fal-pygrid-30.lancs.ac.uk//dpm/lancs.ac.uk/home/dune/testpro/d6/e1/awt-download-2023-01-13-01.txt awt-download-2023-01-13-01.txt' returns 0 [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py No trace_host passed. Using rucio_host instead[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py No creds passed. Trying to get it from the config file.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py HTTPS is required, but no ca_cert was passed. Trying to get it from X509_CERT_DIR.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py HTTPS is required, but no ca_cert was passed and X509_CERT_DIR is not defined. Trying to get it from the config file.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py No account passed. Trying to get it from the RUCIO_ACCOUNT environment variable or the config file.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py No VO passed. Trying to get it from environment variable RUCIO_VO.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py No VO found. Trying to get it from the config file.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py No VO found. Using default VO.[0m [36;1m2023-01-30 14:36:16,123 DEBUG baseclient.py got token from file[0m [36;1m2023-01-30 14:36:16,127 DEBUG uploadclient.py Num. of files that upload client is processing: 1[0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP request: GET https://dune-rucio.fnal.gov/rses/?expression=LANCASTER[0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP header: X-Rucio-Auth-Token: [hidden][0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP header: X-Rucio-VO: def[0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP header: Connection: Keep-Alive[0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP header: User-Agent: rucio-clients/1.29.10[0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP header: X-Rucio-Script: rucio::--verbose[0m [36;1m2023-01-30 14:36:16,128 DEBUG baseclient.py HTTP header: X-Rucio-Account: dunepro[0m [36;1m2023-01-30 14:36:17,286 DEBUG baseclient.py HTTP Response: 200 OK[0m [36;1m2023-01-30 14:36:17,904 DEBUG uploadclient.py Input validation done.[0m [32;1m2023-01-30 14:36:17,904 INFO Preparing upload for file awt-1675089309-XRpTOQxuXf[0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP request: GET https://dune-rucio.fnal.gov/rses/LANCASTER/attr/[0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP header: X-Rucio-Auth-Token: [hidden][0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP header: X-Rucio-VO: def[0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP header: Connection: Keep-Alive[0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP header: User-Agent: rucio-clients/1.29.10[0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP header: X-Rucio-Script: rucio::--verbose[0m [36;1m2023-01-30 14:36:17,904 DEBUG baseclient.py HTTP header: X-Rucio-Account: dunepro[0m [36;1m2023-01-30 14:36:18,071 DEBUG baseclient.py HTTP Response: 200 OK[0m [36;1m2023-01-30 14:36:18,071 DEBUG uploadclient.py wan domain is used for the upload[0m [36;1m2023-01-30 14:36:18,071 DEBUG uploadclient.py Registering file[0m [36;1m2023-01-30 14:36:18,071 DEBUG baseclient.py HTTP request: GET https://dune-rucio.fnal.gov/accounts/dunepro/scopes/[0m [36;1m2023-01-30 14:36:18,072 DEBUG baseclient.py HTTP header: X-Rucio-Auth-Token: [hidden][0m [36;1m2023-01-30 14:36:18,072 DEBUG baseclient.py HTTP header: X-Rucio-VO: def[0m [36;1m2023-01-30 14:36:18,072 DEBUG baseclient.py HTTP header: Connection: Keep-Alive[0m [36;1m2023-01-30 14:36:18,072 DEBUG baseclient.py HTTP header: User-Agent: rucio-clients/1.29.10[0m [36;1m2023-01-30 14:36:18,072 DEBUG baseclient.py HTTP header: X-Rucio-Script: rucio::--verbose[0m [36;1m2023-01-30 14:36:18,072 DEBUG baseclient.py HTTP header: X-Rucio-Account: dunepro[0m [36;1m2023-01-30 14:36:18,232 DEBUG baseclient.py HTTP Response: 200 OK[0m [36;1m2023-01-30 14:36:18,232 DEBUG uploadclient.py Skipping dataset registration[0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP request: GET https://dune-rucio.fnal.gov/dids/testpro/awt-1675089309-XRpTOQxuXf/meta[0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP header: X-Rucio-Auth-Token: [hidden][0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP header: X-Rucio-VO: def[0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP header: Connection: Keep-Alive[0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP header: User-Agent: rucio-clients/1.29.10[0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP header: X-Rucio-Script: rucio::--verbose[0m [36;1m2023-01-30 14:36:18,233 DEBUG baseclient.py HTTP header: X-Rucio-Account: dunepro[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP Response: 404 NOT FOUND[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py Response text (length=129): [{"ExceptionClass": "DataIdentifierNotFound", "ExceptionMessage": "Data identifier 'testpro:awt-1675089309-XRpTOQxuXf' not found"}][0m [36;1m2023-01-30 14:36:18,391 DEBUG uploadclient.py File DID does not exist[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP request: POST https://dune-rucio.fnal.gov/replicas[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP header: X-Rucio-Auth-Token: [hidden][0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP header: X-Rucio-VO: def[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP header: Connection: Keep-Alive[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP header: User-Agent: rucio-clients/1.29.10[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP header: X-Rucio-Script: rucio::--verbose[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py HTTP header: X-Rucio-Account: dunepro[0m [36;1m2023-01-30 14:36:18,391 DEBUG baseclient.py Request data (length=266): [{"rse": "LANCASTER", "files": [{"scope": "testpro", "name": "awt-1675089309-XRpTOQxuXf", "bytes": 26, "adler32": "624 ... ty": true}][0m [36;1m2023-01-30 14:36:18,605 DEBUG baseclient.py HTTP Response: 201 CREATED[0m [32;1m2023-01-30 14:36:18,605 INFO Successfully added replica in Rucio catalogue at LANCASTER[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP request: POST https://dune-rucio.fnal.gov/rules/[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP header: X-Rucio-Auth-Token: [hidden][0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP header: X-Rucio-VO: def[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP header: Connection: Keep-Alive[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP header: User-Agent: rucio-clients/1.29.10[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP header: X-Rucio-Script: rucio::--verbose[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py HTTP header: X-Rucio-Account: dunepro[0m [36;1m2023-01-30 14:36:18,606 DEBUG baseclient.py Request data (length=445): [{"dids": [{"scope": "testpro", "name": "awt-1675089309-XRpTOQxuXf"}], "copies": 1, "rse_expression": "LANCASTER", "we ... ta": null}][0m [36;1m2023-01-30 14:36:18,875 DEBUG baseclient.py HTTP Response: 201 CREATED[0m [32;1m2023-01-30 14:36:18,875 INFO Successfully added replication rule at LANCASTER[0m [36;1m2023-01-30 14:36:18,887 DEBUG logging.py gfal.Default: connecting to storage[0m [36;1m2023-01-30 14:36:18,924 DEBUG logging.py gfal.Default: checking if file exists None[0m [36;1m2023-01-30 14:36:18,925 DEBUG rsemanager.py Checking if https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf exists[0m [36;1m2023-01-30 14:36:18,925 DEBUG logging.py gfal.Default: checking if file exists https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf[0m [36;1m2023-01-30 14:36:19,371 DEBUG logging.py gfal.Default: closing protocol connection[0m [36;1m2023-01-30 14:36:19,372 DEBUG uploadclient.py [{'hostname': 'fal-pygrid-30.lancs.ac.uk', 'scheme': 'https', 'port': 0, 'prefix': '/dpm/lancs.ac.uk/home/dune', 'impl': 'rucio.rse.protocols.gfal.Default', 'domains': {'lan': {'read': 0, 'write': 0, 'delete': 0}, 'wan': {'read': 1, 'write': 1, 'delete': 1, 'third_party_copy_read': 1, 'third_party_copy_write': 1}}, 'extended_attributes': None}][0m [32;1m2023-01-30 14:36:19,372 INFO Trying upload with https to LANCASTER[0m [36;1m2023-01-30 14:36:19,372 DEBUG uploadclient.py Processing upload with the domain: wan[0m [36;1m2023-01-30 14:36:19,372 DEBUG logging.py gfal.Default: connecting to storage[0m [36;1m2023-01-30 14:36:19,375 DEBUG uploadclient.py The PFN created from the LFN: https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf[0m [36;1m2023-01-30 14:36:19,375 DEBUG logging.py gfal.Default: checking if file exists https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf[0m [36;1m2023-01-30 14:36:19,709 DEBUG logging.py gfal.Default: checking if file exists https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf.rucio.upload[0m [36;1m2023-01-30 14:36:19,848 DEBUG utils.py put: Attempt 1[0m [36;1m2023-01-30 14:36:19,848 DEBUG logging.py gfal.Default: uploading file from awt-1675089309-XRpTOQxuXf to https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf.rucio.upload[0m error in request of checking file QoS: is not a valid HTTP or Webdav URL
(process:1479): GLib-WARNING **: 14:36:19.849: GError set over the top of a previous GError or uninitialized memory. This indicates a bug in someone's code. You must ensure an error is NULL before it's set. The overwriting error message was: \x01 is not a valid HTTP or Webdav URL [36;1m2023-01-30 14:36:20,160 DEBUG utils.py put: Attempt failed 1[0m [36;1m2023-01-30 14:36:20,160 DEBUG utils.py Access to local destination denied. Details: Source file not found. Details: TRANSFER ERROR: Copy failed (streamed). Last attempt: HTTP 404 : File not found[0m [36;1m2023-01-30 14:36:20,160 DEBUG logging.py gfal.Default: uploading file from awt-1675089309-XRpTOQxuXf to https://fal-pygrid-30.lancs.ac.uk/dpm/lancs.ac.uk/home/dune/testpro/7f/b2/awt-1675089309-XRpTOQxuXf.rucio.upload[0m error in request of checking file QoS: is not a valid HTTP or Webdav URL
(process:1479): GLib-WARNING **: 14:36:20.161: GError set over the top of a previous GError or uninitialized memory. This indicates a bug in someone's code. You must ensure an error is NULL before it's set. The overwriting error message was: \x01 is not a valid HTTP or Webdav URL [33;1m2023-01-30 14:36:20,430 WARNING Upload attempt failed[0m [32;1m2023-01-30 14:36:20,430 INFO Exception: RSE does not support requested operation. Details: Access to local destination denied. Details: Source file not found. Details: TRANSFER ERROR: Copy failed (streamed). Last attempt: HTTP 404 : File not found[0m Traceback (most recent call last): File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/rse/protocols/gfal.py", line 462, in __gfal2_copy ret = ctx.filecopy(params, str(src), str(dest)) gfal2.GError: TRANSFER ERROR: Copy failed (streamed). Last attempt: HTTP 404 : File not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/rse/protocols/gfal.py", line 279, in put status = self.__gfal2_copy(str(source_url), str(target), None, space_token, transfer_timeout=transfer_timeout) File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/rse/protocols/gfal.py", line 470, in __gfal2_copy raise exception.SourceNotFound(error) rucio.common.exception.SourceNotFound: Source file not found. Details: TRANSFER ERROR: Copy failed (streamed). Last attempt: HTTP 404 : File not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/client/uploadclient.py", line 649, in _upload_item retry(protocol_write.put, base_name, pfn_tmp, source_dir, transfer_timeout=transfer_timeout)(mtries=2, logger=logger) File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/common/utils.py", line 1575, in call return self.func(*self.args, **self.kwargs) File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/rse/protocols/gfal.py", line 285, in put raise exception.DestinationNotAccessible(str(error)) rucio.common.exception.DestinationNotAccessible: Access to local destination denied. Details: Source file not found. Details: TRANSFER ERROR: Copy failed (streamed). Last attempt: HTTP 404 : File not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/client/uploadclient.py", line 269, in upload sign_service=sign_service) File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/rucio/client/uploadclient.py", line 652, in _upload_item raise RSEOperationNotSupported(str(error)) rucio.common.exception.RSEOperationNotSupported: RSE does not support requested operation. Details: Access to local destination denied. Details: Source file not found. Details: TRANSFER ERROR: Copy failed (streamed). Last attempt: HTTP 404 : File not found /cvmfs/dune.opensciencegrid.org/products/dune/rucio/v1_29_10/NULL/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host 'dune-rucio.fnal.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings InsecureRequestWarning, [31;1m2023-01-30 14:36:20,917 ERROR Failed to upload file awt-1675089309-XRpTOQxuXf[0m [31;1m2023-01-30 14:36:20,917 ERROR None of the given files have been uploaded.[0m Completed in 4.7950 sec. 'rucio --verbose upload --rse LANCASTER --protocol https --scope testpro --lifetime 86400 --name awt-1675089309-XRpTOQxuXf awt-1675089309-XRpTOQxuXf' returns 79