I wanted to run this workflow in this directory on this input file with Toil. This workflow gets some images out of the public bucket s3://spacenet-dataset/ and operates on them.
[2022-05-11T14:25:57-0700] [MainThread] [D] [toil.statsAndLogging] Suppressing the following loggers: {'pymesos', 'google', 'rsa', 'botocore', 'dill', 'charset_normalizer', 'pyasn1', 'salad', 'docker', 'boto3', 'websocket', 'kubernetes', 'galaxy', 'pkg_resources', 'bcdocs', 'urllib3', 'boto', 'prov', 'cachecontrol', 'requests', 'requests_oauthlib', 'oauthlib', 'humanfriendly', 'rdflib'}
[2022-05-11T14:25:57-0700] [MainThread] [D] [toil.statsAndLogging] Root logger is at level 'DEBUG', 'toil' logger at level 'DEBUG'.
[2022-05-11T14:25:57-0700] [MainThread] [D] [toil.lib.threading] Total machine size: 64 cores
[2022-05-11T14:25:57-0700] [MainThread] [D] [toil.lib.threading] CPU quota: -1
[2022-05-11T14:25:57-0700] [MainThread] [D] [toil.jobStores.fileJobStore] Path to job store directory is '/public/groups/cgl/graph-genomes/anovak/build/amazon-genomics-cli/examples/demo-cwl-project/workflows/s3demo/tree'.
[2022-05-11T14:25:57-0700] [MainThread] [D] [toil.jobStores.abstractJobStore] The workflow ID is: '41dab19c-6577-42a1-9eb4-7c74add2a306'
[2022-05-11T14:25:57-0700] [MainThread] [I] [cwltool] Resolved './s3demo.cwl' to 'file:///public/groups/cgl/graph-genomes/anovak/build/amazon-genomics-cli/examples/demo-cwl-project/workflows/s3demo/s3demo.cwl'
[2022-05-11T14:26:09-0700] [MainThread] [D] [toil.cwl.cwltoil] Importing files for ordereddict([('image_file', ordereddict([('class', 'File'), ('location', 's3://spacenet-dataset/AOIs/AOI_1_Rio/PS-RGB/PS-RGB_mosaic_013022223112.tif'), ('basename', 'PS-RGB_mosaic_013022223112.tif'), ('nameroot', 'PS-RGB_mosaic_013022223112'), ('nameext', '.tif'), ('streamable', False)])), ('image_directory', ordereddict([('class', 'Directory'), ('location', 's3://spacenet-dataset/Hosted-Datasets/fmow/fmow-rgb/val/lighthouse/lighthouse_8'), ('basename', 'lighthouse_8')])), ('image_filename', 'lighthouse_8_0_rgb.jpg')])
[2022-05-11T14:26:11-0700] [MainThread] [E] [toil.lib.retry] Got a <class 'botocore.exceptions.ClientError'>: An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied which is not retriable according to <function retryable_s3_errors at 0x7fcf777e5f70>
[2022-05-11T14:26:11-0700] [MainThread] [E] [toil.cwl.cwltoil] Got exception 'An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied' while copying 's3://spacenet-dataset/AOIs/AOI_1_Rio/PS-RGB/PS-RGB_mosaic_013022223112.tif'
[2022-05-11T14:26:11-0700] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/public/groups/cgl/graph-genomes/anovak/build/amazon-genomics-cli/examples/demo-cwl-project/workflows/s3demo/tree)
Traceback (most recent call last):
File "/public/home/anovak/build/toil/venv/bin/toil-cwl-runner", line 33, in <module>
sys.exit(load_entry_point('toil', 'console_scripts', 'toil-cwl-runner')())
File "/public/home/anovak/build/toil/src/toil/cwl/cwltoil.py", line 3447, in main
import_files(
File "/public/home/anovak/build/toil/src/toil/cwl/cwltoil.py", line 1527, in import_files
visit_cwl_class_and_reduce(
File "/public/home/anovak/build/toil/src/toil/cwl/utils.py", line 123, in visit_cwl_class_and_reduce
for result in visit_cwl_class_and_reduce(rec[key], classes, op_down, op_up):
File "/public/home/anovak/build/toil/src/toil/cwl/utils.py", line 127, in visit_cwl_class_and_reduce
results.append(op_up(rec, down_result, child_results))
File "/public/home/anovak/build/toil/src/toil/cwl/cwltoil.py", line 1488, in visit_file_or_directory_up
upload_file(
File "/public/home/anovak/build/toil/src/toil/cwl/cwltoil.py", line 1623, in upload_file
file_metadata["location"] = write_file(uploadfunc, fileindex, existing, location)
File "/public/home/anovak/build/toil/src/toil/cwl/cwltoil.py", line 1328, in write_file
index[file_uri] = "toilfile:" + writeFunc(rp).pack()
File "/public/home/anovak/build/toil/src/toil/lib/compatibility.py", line 12, in call
return func(*args, **kwargs)
File "/public/home/anovak/build/toil/src/toil/common.py", line 1135, in importFile
return self.import_file(srcUrl, sharedFileName, symlink)
File "/public/home/anovak/build/toil/src/toil/common.py", line 1149, in import_file
return self._jobStore.import_file(src_uri, shared_file_name=shared_file_name, symlink=symlink)
File "/public/home/anovak/build/toil/src/toil/jobStores/abstractJobStore.py", line 390, in import_file
return self._import_file(otherCls,
File "/public/home/anovak/build/toil/src/toil/jobStores/fileJobStore.py", line 310, in _import_file
return super()._import_file(otherCls, uri, shared_file_name=shared_file_name)
File "/public/home/anovak/build/toil/src/toil/jobStores/abstractJobStore.py", line 420, in _import_file
size, executable = otherCls._read_from_url(uri, writable)
File "/public/home/anovak/build/toil/src/toil/jobStores/aws/jobStore.py", line 464, in _read_from_url
srcObj = get_object_for_url(url, existing=True)
File "/public/home/anovak/build/toil/src/toil/lib/aws/utils.py", line 253, in get_object_for_url
region = get_bucket_region(bucketName, endpoint_url=endpoint_url)
File "/public/home/anovak/build/toil/src/toil/lib/aws/utils.py", line 217, in get_bucket_region
loc = s3_client.get_bucket_location(Bucket=bucket_name)
File "/public/home/anovak/build/toil/venv/lib/python3.9/site-packages/botocore/client.py", line 395, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/public/home/anovak/build/toil/venv/lib/python3.9/site-packages/botocore/client.py", line 725, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetBucketLocation operation: Access Denied
Toil goes and gets the bucket location to save a redirect when reading from it. But not all public buckets grant permission to do that; some only grant permission to read the data.
Toil's S3 access code (get_object_for_url) should handle the case where we don't have permission to get the bucket location (our get_bucket_region utility throws a botocore.exceptions.ClientError that looks like AccessDenied), and fall back to fetching the data without knowing the location.
┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-1166
I wanted to run this workflow in this directory on this input file with Toil. This workflow gets some images out of the public bucket
s3://spacenet-dataset/
and operates on them.However, it didn't work. When I ran:
I got this error:
Toil goes and gets the bucket location to save a redirect when reading from it. But not all public buckets grant permission to do that; some only grant permission to read the data.
Toil's S3 access code (
get_object_for_url
) should handle the case where we don't have permission to get the bucket location (ourget_bucket_region
utility throws abotocore.exceptions.ClientError
that looks likeAccessDenied
), and fall back to fetching the data without knowing the location.┆Issue is synchronized with this Jira Story ┆friendlyId: TOIL-1166