cytomining / CytoTable

Transform CellProfiler and DeepProfiler data for processing image-based profiling readouts with Pycytominer and other Cytomining tools.
https://cytomining.github.io/CytoTable/
BSD 3-Clause "New" or "Revised" License
7 stars 5 forks source link

no_sign_request=True does not skip AWS config ID #61

Open gwaybio opened 1 year ago

gwaybio commented 1 year ago

I'm receiving an InvalidAccessKeyID error after deleting my AWS configure ID from maple. I wanted to try the no_sign_request=True option before adding credentials back to maple, but received an error that I outline below.

As specified here, https://github.com/cytomining/CytoTable/issues/52#issuecomment-1553125402, I ran cytotable.convert(..., no_sign_request=True) (full command below)

cytotable.convert(
    source_path="s3://cellpainting-gallery/cpg0016-jump/source_1/workspace/backend/Batch1_20221004/UL001643"
    dest_path="test2.parquet",
    dest_datatype="parquet",
    chunk_size=150000,
    parsl_config=parsl_config,
    no_sign_request=True,
    preset="cellprofiler_sqlite_pycytominer"
)

But I received the following error:

exception of type <class 'botocore.exceptions.ClientError'>
1685550430.488742 2023-05-31 10:27:10 MainProcess-248907 HTEX-Queue-Management-Thread-139723934787136 parsl.dataflow.dflow:304 handle_exec_update DEBUG: Task 3 try 0 failed
1685550430.488833 2023-05-31 10:27:10 MainProcess-248907 HTEX-Queue-Management-Thread-139723934787136 parsl.dataflow.dflow:350 handle_exec_update ERROR: Task 3 failed after 0 retry attempts
Traceback (most recent call last):
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/parsl/dataflow/dflow.py", line 301, in handle_exec_update
    res = self._unwrap_remote_exception_wrapper(future)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/parsl/dataflow/dflow.py", line 567, in _unwrap_remote_exception_wrapper
    result.reraise()
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/parsl/app/errors.py", line 122, in reraise
    reraise(t, v, v.__traceback__)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/parsl/app/errors.py", line 160, in wrapper
    return func(*args, **kwargs)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/cytotable/sources.py", line 81, in _get_source_filepaths
    if AnyPath(path).is_file()
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/cloudpathlib/s3/s3path.py", line 39, in is_file
    return self.client._is_file_or_dir(self) == "file"
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/cloudpathlib/s3/s3client.py", line 164, in _is_file_or_dir
    return self._s3_file_query(cloud_path)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/cloudpathlib/s3/s3client.py", line 197, in _s3_file_query
    return next(
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/cloudpathlib/s3/s3client.py", line 198, in <genexpr>
    (
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/boto3/resources/collection.py", line 81, in __iter__
    for page in self.pages():
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/boto3/resources/collection.py", line 171, in pages
    for page in pages:
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/gway/miniconda3/envs/jump_sc/lib/python3.10/site-packages/botocore/client.py", line 964, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.
1685550430.490109 2023-

parsl.log

d33bs commented 1 year ago

Thank you for opening this @gwaybio ! Sorry to hear this is happening. I'm working on trying to reproduce this error and wanted to follow up with a message in the meantime. Based on the log and error messages, my understanding so far is that there may be something occurring with how Boto3 (which cloudpathlib uses to access remote files on AWS S3) is interpreting the client configuration. An initial glance here makes me wonder if all the configurations are truly gone or if there are partial remnants that could be overriding the no_sign_request parameter and causing a challenge.

Boto3 credentials documentation mention a cascade of configuration locations including various files and environment variables. Could I verify with you that these are removed or non-existent on the device you attempted this on?

Specifically, I believe we need to check the following:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, or AWS_SESSION_TOKEN)
    • In bash or zsh, a command to observe how these are set might look like: export | grep "AWS_ACCESS_KEY_ID\|AWS_SECRET_ACCESS_KEY\|AWS_SESSION_TOKEN"
  2. Shared credential file (~/.aws/credentials)
  3. AWS config file (~/.aws/config)
  4. Boto2 config file (/etc/boto.cfg and ~/.boto)
gwaybio commented 1 year ago

Could I verify with you that these are removed or non-existent on the device you attempted this on?

Sorry for not following up on this. I see how knowing these credentials and config checks as you list can help with documentation in #62 - I unfortunately cannot verify if these checks.

Are you able to reproduce the error? If not, then those four checks is a good place to start for future troubleshooting.