datalad-datasets / hcp-functional-connectivity

Preprocessed functional connectivity MR data from the WU-Minn HCP1200 dataset. **More info in README.**
https://github.com/datalad-datasets/hcp-functional-connectivity/blob/master/README.md
1 stars 2 forks source link

"s3 key refused" error on `datalad get` #4

Closed mckenziephagen closed 1 year ago

mckenziephagen commented 1 year ago

Hi! I've successfully cloned the dataset following the instructions in the readme, but when I go to "get" a participant's data, I get an error that the s3 key was refused.

get(error): 171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt (file) [Failed to download from any of 1 locations ['S3 refused to provide the key for HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt from url s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt?versionId=xKe91c2Yr0GgJ6laGI0y9VEPTX1jIcfl -caused by- S3ResponseError: 403 Forbidden\n']

If I try to just download it using the aws cli, and the s3 link from that output, I also get an error.

(fc) mphagen@nid005744:/pscratch/sd/m/mphagen/hcp-functional-connectivity> aws s3 cp s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt?versionId=xKe91c2Yr0GgJ6laGI0y9VEPTX1jIcfl . 

fatal error: An error occurred (404) when calling the HeadObject operation: Key "HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt?versionId=xKe91c2Yr0GgJ6laGI0y9VEPTX1jIcfl" does not exist 

But, if I trim off the trailing alphanumeric junk, then I AM able to download it. So it doesn't seem to be an issue with my s3 keys being set incorrectly.

aws s3 cp s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt . 

download: s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt to ./rfMRI_REST1_LR_Atlas_stats.txt

My version of datalad is 0.19.2.

I was also able to replicate this in the full HCP datalad dataset, so it's not something that's specific to this dataset.

edit:

output of datalad wtf ``` [WARNING] Could not determine filesystem types due to KeyError(None) # WTF ## configuration ## credentials - keyring: - active_backends: - PlaintextKeyring with no encyption v.1.0 at /global/homes/m/mphagen/.local/share/python_keyring/keyring_pass.cfg - config_file: /global/homes/m/mphagen/.config/python_keyring/keyringrc.cfg - data_root: /global/homes/m/mphagen/.local/share/python_keyring ## datalad - version: 0.19.2 ## dependencies - annexremote: 1.6.0 - boto: 2.49.0 - cmd:7z: 16.02 - cmd:annex: 10.20230408-g5b1e8ba77 - cmd:bundled-git: 2.40.1 - cmd:git: 2.40.1 - cmd:ssh: 8.4p1 - cmd:system-git: 2.40.1 - cmd:system-ssh: 8.4p1 - humanize: 4.7.0 - iso8601: 2.0.0 - keyring: 24.2.0 - keyrings.alt: 4.2.0 - msgpack: 1.0.3 - platformdirs: 3.10.0 - requests: 2.31.0 ## environment - GIT_PYTHON_REFRESH: quiet - LANG: en_US.UTF-8 - PATH: /global/common/software/nersc/current/jupyter/ex/23-06/nersc-utils:/global/homes/m/mphagen/miniconda3/envs/fc/bin:/global/homes/m/mphagen/miniconda3/condabin:/global/common/software/nersc/bin:/global/common/software/nersc/pm-2021q4/easybuild/software/Nsight-Systems/2022.2.1:/global/common/software/nersc/pm-2021q4/easybuild/software/Nsight-Systems/2022.2.1/bin:/global/common/software/nersc/pm-2021q4/easybuild/software/Nsight-Compute/2022.1.1:/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/compute-sanitizer:/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/bin:/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/cuda/11.7/libnvvp:/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/profilers/Nsight_Compute:/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/profilers/Nsight_Systems/bin:/opt/cray/pe/perftools/23.03.0/bin:/opt/cray/pe/papi/7.0.0.1/bin:/opt/cray/pe/gcc/11.2.0/bin:/opt/cray/pe/craype/2.7.20/bin:/opt/cray/pe/mpich/8.1.25/ofi/gnu/9.1/bin:/opt/cray/pe/mpich/8.1.25/bin:/opt/cray/libfabric/1.15.2.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/opt/cray/pe/bin - PYTHONFAULTHANDLER: ## extensions ## git-annex - build flags: - Assistant - Webapp - Pairing - Inotify - DBus - DesktopNotify - TorrentParser - MagicMime - Benchmark - Feeds - Testsuite - S3 - WebDAV - dependency versions: - aws-0.22.1 - bloomfilter-2.0.1.0 - cryptonite-0.29 - DAV-1.3.4 - feed-1.3.2.1 - ghc-9.0.2 - http-client-0.7.13.1 - persistent-sqlite-2.13.1.0 - torrent-10000.1.1 - uuid-1.3.15 - yesod-1.6.2.1 - key/value backends: - SHA256E - SHA256 - SHA512E - SHA512 - SHA224E - SHA224 - SHA384E - SHA384 - SHA3_256E - SHA3_256 - SHA3_512E - SHA3_512 - SHA3_224E - SHA3_224 - SHA3_384E - SHA3_384 - SKEIN256E - SKEIN256 - SKEIN512E - SKEIN512 - BLAKE2B256E - BLAKE2B256 - BLAKE2B512E - BLAKE2B512 - BLAKE2B160E - BLAKE2B160 - BLAKE2B224E - BLAKE2B224 - BLAKE2B384E - BLAKE2B384 - BLAKE2BP512E - BLAKE2BP512 - BLAKE2S256E - BLAKE2S256 - BLAKE2S160E - BLAKE2S160 - BLAKE2S224E - BLAKE2S224 - BLAKE2SP256E - BLAKE2SP256 - BLAKE2SP224E - BLAKE2SP224 - SHA1E - SHA1 - MD5E - MD5 - WORM - URL - X* - operating system: linux x86_64 - remote types: - git - gcrypt - p2p - S3 - bup - directory - rsync - web - bittorrent - webdav - adb - tahoe - glacier - ddar - git-lfs - httpalso - borg - hook - external - supported repository versions: - 8 - 9 - 10 - upgrade supported from repository versions: - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - version: 10.20230408-g5b1e8ba77 ## location - path: /global/u1/m/mphagen - type: directory ## metadata.extractors ## metadata.filters ## metadata.indexers ## python - implementation: CPython - version: 3.9.17 ## system - distribution: sles/15.4/n/a - encoding: - default: utf-8 - filesystem: utf-8 - locale.prefered: UTF-8 - filesystem: - CWD: - path: /global/u1/m/mphagen - HOME: - path: /global/homes/m/mphagen - TMP: - path: /tmp - max_path_length: 276 - name: Linux - release: 5.14.21-150400.24.46_12.0.73-cray_shasta_c - type: posix - version: #1 SMP Tue Jun 13 16:43:10 UTC 2023 (9c4698c) ```
yarikoptic commented 1 year ago

Smells like something happened to let's in that bucket... Needs a closer look

mckenziephagen commented 1 year ago

So this seems like the exact issue further down the thread here by jajcayn.

I have tried the datalad update --merge -r adswa suggested, but that didn't change anything.

yarikoptic commented 1 year ago

someone should check with HCP folks on what have they done with the bucket. On that sample key:

❯ datalad ls -aL s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt
Connecting to bucket: hcp-openaccess
[INFO   ] S3 session: Connecting to the bucket hcp-openaccess with authentication 
Bucket info:
  Versioning: {'Versioning': 'Enabled'}
     Website: hcp-openaccess.s3-website-us-east-1.amazonaws.com
         ACL: S3ResponseError: 403 Forbidden
HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt 2018-08-18T15:24:40.000Z 1693 ver:None                              acl:<Policy: ccf-aws (owner) = FULL_CONTROL>  http://hcp-openaccess.s3.amazonaws.com/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt [E: 403]
❯ datalad download-url s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt
[INFO   ] Downloading 's3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt' into '/tmp/' 
[INFO   ] S3 session: Connecting to the bucket hcp-openaccess with authentication 
download_url(ok): /tmp/rfMRI_REST1_LR_Atlas_stats.txt (file)  

I have tried the datalad update --merge -r adswa suggested, but that didn't change anything.

We would need to do some magic pixie dust sprinkling to fix up, extend list of URLs for each file after we could first determine what has happened so we would get a clue on longevity of current setup. In general S3 URLs without versionIds are inferior since then content could be changed for the same URL.

mckenziephagen commented 1 year ago

What would the timeframe on a fix be? Or is there any workaround so that I'd be able to download the data?

Thanks!

yarikoptic commented 1 year ago

looked into it again. realized that we are now looking at hcp-functional-connectivity and not original https://github.com/datalad-datasets/human-connectome-project-openaccess/ . I looked at how the issue was "worked around" in that original one - @mih just provided URLs without versionIds for all the keys. It does provide immediate remedy but I would prefer to avoid that. That is why

git annex whereis --json | jq -r '.key + " " + (.whereis[] | select(.urls[] | startswith("s3://hcp-openaccess")) | .urls[])'| sed -e 's,\?.*,,g' | git annex registerurl --batch

which would populate git-annex with URLs while stripping versionIds. Note that it would take awhile. I am running it too and might later share that adjusted git-annex branch in a clone

mckenziephagen commented 1 year ago

Amazing, thank you! How long is "awhile" - hours or days?

yarikoptic commented 1 year ago

on my laptop it is still running for an hour or so. it is 72113 files, does about 6 files per second, so about 200 minutes, thus about 3 hours.

mckenziephagen commented 1 year ago

So, a few things.

You noted that datalad download-url works for you with the s3 key without the versionid, but I'm not able to replicate that. I also get an error informing me that datalad ls belongs to the package datalad-deprecated, so that might be related. I AM able to download this file fine with aws s3 cp.

fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> datalad download-url s3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz
[INFO   ] Downloading 's3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz' into '/pscratch/sd/m/mphagen/data/hcp-functional-connectivity/' 
[INFO   ] S3 session: Connecting to the bucket hcp-openaccess with authentication 
download_url(error): /pscratch/sd/m/mphagen/data/hcp-functional-connectivity/ (file) [TargetFileAbsent(S3 refused to provide the key for HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz from url s3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz)] [S3 refused to provide the key for HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz from url s3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz]

When I tried to run the jq command listed above in the functional connectivity dataset, I got output that seemed like a success, but didn't lead to me being able to datalad get anything.

Here's a chunk of the output from the jq command - something to note is that these are ALL text or json files. No other filetype.

(fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> git annex whereis --json | jq -r '.key + " " + (.whereis[] | select(.urls[] | startswith("s3://hcp-openaccess")) | .urls[])'| sed -e 's,\?.*,,g' | git annex registerurl --batch

registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok

I tried in a fresh clone of both the functional connectivity and full HCP datalad dataset, but got the same results - in the below chunk of code I try to get one of the files that supposedly had the URL fixed in git-annex.

fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> datalad get 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt
get(error): 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt (file) [Failed to download from any of 2 locations ['S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt -caused by- S3ResponseError: 403 Forbidden\n', 'S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR -caused by- S3ResponseError: 403 Forbidden\n']
Failed to download from any of 2 locations ['S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt -caused by- S3ResponseError: 403 Forbidden\n', 'S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR -caused by- S3ResponseError: 403 Forbidden\n']
Failed to download from any of 2 locations ['S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt -caused by- S3ResponseError: 403 Forbidden\n', 'S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR -caused by- S3ResponseError: 403 Forbidden\n']]

and the output of git-annex whereis for that file:

(fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> git annex whereis 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt
whereis 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt (1 copy) 
        5435893f-9dce-4098-98ec-82d825391bbd -- [datalad]

  datalad: s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt
  datalad: s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
ok

And last, the HCP folks responded in the Google Group. They said that they had fixed the s3 policy to allow versionids. I'm still getting an s3 error when I try to download keys that have versionids, so I've followed up with them on that. Posting a link here for posterity.

In sum - my datalad download-url doesn't download s3 keys without versionids like you demonstrated above, the versionids seemingly aren't being properly stripped by the jq command, and the HCP folks have done something to the s3 policy that gives me a new error.

Since it seems like the s3 policy might be resolved, the jq issue probably isn't something that needs to be solved. But, if you have any intuition about why datalad download-url wouldn't be working for me, that would be great. I'm making the assumptions that datalad download-url is somehow implicated in datalad get, and therefor, important to get to the bottom of why it's misbehaving.

Thanks for bearing with me - I appreciate your expertise!

yarikoptic commented 1 year ago

indeed they seems have fixed permissions, and I was able to initiate download a sample subject just fine:

$> datalad clone https://github.com/datalad-datasets/hcp-functional-connectivity hcp-functional-connectivity-2
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore 
[INFO   ] https://github.com/datalad-datasets/hcp-functional-connectivity/config download failed: Not Found 
install(ok): /mnt/btrfs/datasets/datalad/crawl-misc/hcp/hcp-functional-connectivity-2 (dataset)
datalad clone https://github.com/datalad-datasets/hcp-functional-connectivity  5.43s user 3.95s system 20% cpu 46.769 total

$> cd hcp-functional-connectivity-2 

$> datalad get -J4 176037
Total:  42%|██████████████████████████████████████████████████████████████████████████████████████▎                                                                                                                      | 3.33G/7.90G [06:00<08:14, 9.24M Bytes/s]

so permissions issue seems to be addressed. Do you observe something different?

yarikoptic commented 1 year ago

@mckenziephagen please check and report back on either it works for you now.

mckenziephagen commented 1 year ago

It's working on my local laptop, but still erroring on my HPC (Perlmutter at NERSC, if that's helpful information). I've tried uninstalling and reinstalling Datalad and aws in a new conda environment and recloning both the functional data sub-dataset and the whole HCP dataset. The results are consistent - S3 forbidden error when trying datalad get a file, and "TargetFileAbsent" when I try datalad download-url.

I have the same aws credentials set on my local and HPC, and the same datalad version. The git-annex versions are slightly different (8.20211231 on my local, and 10.20230626 on HPC). One additional difference is that I didn't try to download the dataset on my laptop until after the key policy issue was fixed.

yarikoptic commented 1 year ago

on HPC -- were you asked by datalad get or datalad download-url the credential?

what is the output of datalad -l 5 download-url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR ? (but be mindful to not share if you spot your secret in there -- but it should be made visible AFAIK). You might want to try to set key and secret ids via env vars DATALAD_hcp_s3_key_id and DATALAD_hcp_s3_secret_id in the next attempt.

mckenziephagen commented 1 year ago

I think I was probably asked the very first time I tried a couple of days ago, but not since. I know I was asked the first time I ran datalad get on my laptop, because that was more recent. I have reset the credentials once or twice while trying to debug, but the ones currently set in the aws config file are correct.

The error before and after setting DATALAD_hcp_s3_key_id and DATALAD_hcp_s3_secret_id with my credentials is the same:

(lad-env) mphagen@perlmutter:login31:/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200> datalad -l 5 download-url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR

[**DEBUG**  ] Command line args 1st pass for DataLad 0.19.3. Parsed: Namespace() Unparsed: ['download-url', 's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR'] 

[Level 5] Importing module datalad.local.download_url  

[**DEBUG**  ] Building doc for <class 'datalad.core.local.status.Status'> 

[**DEBUG**  ] Building doc for <class 'datalad.core.local.save.Save'> 

[**DEBUG**  ] Building doc for <class 'datalad.local.download_url.DownloadURL'> 

[Level 5] Finished setup_parser 

[**DEBUG**  ] Parsing known args among ['/global/homes/m/mphagen/miniconda3/envs/lad-env/bin/datalad', '-l', '5', 'download-url', 's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR'] 

[**DEBUG**  ] Determined class of decorated function: <class 'datalad.local.download_url.DownloadURL'> 

[Level 5] Parsed ri /pscratch/sd/m/mphagen/human-connectome-project-openaccess into fields {'path': '/pscratch/sd/m/mphagen/human-connectome-project-openaccess', 'scheme': '', 'netloc': '', 'username': '', 'password': '', 'hostname': '', 'port': '', 'query': '', 'fragment': ''} 

[Level 5] Detected file ri 

[**DEBUG**  ] Resolved dataset to download urls: /pscratch/sd/m/mphagen/human-connectome-project-openaccess 

[Level 5] Requested to provide version for cmd:git 

[Level 5] Requested to provide version for cmd:git 

[Level 5] Found 0 previous connections 

[**DEBUG**  ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/pscratch/sd/m/mphagen/human-connectome-project-openaccess) 

[Level 8] Process 1555111 started 

[Level 5] STDERR: git -c diff.ignoreSu (Thread<(STDERR: git -c diff.ignoreSu, 6)>) started 

[Level 5] STDOUT: git -c diff.ignoreSu (Thread<(STDOUT: git -c diff.ignoreSu, 4)>) started 

[Level 5] process_waiter (Thread<process_waiter>) started 

[Level 5] STDOUT: git -c diff.ignoreSu (Thread<(STDOUT: git -c diff.ignoreSu, 4)>) exiting (exit_requested: False, last data: None) 

[Level 5] STDERR: git -c diff.ignoreSu (Thread<(STDERR: git -c diff.ignoreSu, 6)>) exiting (exit_requested: False, last data: None) 

[Level 5] process_waiter (Thread<process_waiter>) exiting 

[Level 5] Detected <class 'datalad.support.gitrepo.GitRepo'> at /pscratch/sd/m/mphagen/human-connectome-project-openaccess 

[**DEBUG**  ] Assigning credentials into 21 providers 

[**DEBUG**  ] Returning provider Provider(authenticator=<<S3Authenticato++27 chars++one)>>, credential=<<AWS_S3(ds=<<Da++82 chars++'>>)>>, name='hcp-s3', url_res=['s3://hcp-openaccess.*']) for url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR 

[Level 5] Requested to provide version for boto 

[INFO   ] Downloading 's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR' into '/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/' 

[**DEBUG**  ] set credential context as s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR 

[**DEBUG**  ] Acquiring a currently existing lock to establish download session. If stalls - check which process holds /global/homes/m/mphagen/.cache/datalad/locks/downloader-auth.lck 

[**DEBUG**  ] S3 session: Reconnecting to the bucket 

[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_key_id 

[**DEBUG**  ] Importing keyring 

[**DEBUG**  ] Loading KWallet 

[**DEBUG**  ] Loading SecretService 

[**DEBUG**  ] Loading Windows 

[**DEBUG**  ] Loading chainer 

[**DEBUG**  ] Loading libsecret 

[**DEBUG**  ] Loading macOS 

[**DEBUG**  ] Loading Gnome 

[**DEBUG**  ] Loading Google 

[**DEBUG**  ] Loading Windows (alt) 

[**DEBUG**  ] Loading file 

[**DEBUG**  ] Loading keyczar 

[**DEBUG**  ] Loading multi 

[**DEBUG**  ] Loading pyfs 

[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_secret_id 

[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_session 

[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_expiration 

[INFO   ] S3 session: Connecting to the bucket hcp-openaccess with authentication 

[Level 8] S3ResponseError: 403 Forbidden

 [s3.py:get_bucket:134,utils.py:_wrap_try_multiple_dec:2181,connection.py:get_bucket:509,connection.py:head_bucket:542] 

[**DEBUG**  ] Cannot access bucket hcp-openaccess by name with validation: S3ResponseError(S3ResponseError: 403 Forbidden

) 

[Level 5] Calling out into <bound method BaseDownloader._download of S3Downloader(authenticator=<<S3Authenticato++56 chars++com)>>, credential=<<AWS_S3(auth_ur++98 chars++'>>)>>)> for s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR 

[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_expiration 

[Level 8] S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR [download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231] 

**download_url**(**error**): /pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/ (**file**) [TargetFileAbsent(S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR)] **[S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR]**

[Level 8] Command did not complete successfully. 1 failed:

[{'action': 'download_url',

  'error_message': 'S3 refused to provide the key for '

                   'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '

                   'from url '

                   's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR',

  'exception': S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR [download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231],

  'exception_traceback': '[download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231]',

  'message': 'TargetFileAbsent(S3 refused to provide the key for '

             'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '

             'from url '

             's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR)',

  'path': '/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/',

  'status': 'error',

  'type': 'file'}] [main.py:_run_with_exception_handler:190,exec.py:call_from_parser:107,base.py:_execute_command_:940] 

[**DEBUG**  ] could not perform all requested actions: IncompleteResultsError(Command did not complete successfully. 1 failed:

[{'action': 'download_url',

  'error_message': 'S3 refused to provide the key for '

                   'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '

                   'from url '

                   's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR',

  'exception': S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR [download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231],

  'exception_traceback': '[download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231]',

  'message': 'TargetFileAbsent(S3 refused to provide the key for '

             'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '

             'from url '

             's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR)',

  'path': '/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/',

  'status': 'error',

  'type': 'file'}]) 

[Level 5] Exiting
yarikoptic commented 1 year ago

please check boto versions on hpc and locally. datalad wtf -S dependencies could tell those.

is HPC in about the same geo location? I wonder if somehow crossing some boundaries/zones would matter...

mckenziephagen commented 1 year ago

I'm in the same city as the HPC.

datalad wtf -S dependencies returns boto versions 2.49.0 for both.

local:

  - annexremote: 1.2.1
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 8.20211231
  - cmd:bundled-git: UNKNOWN
  - cmd:git: 2.41.0
  - cmd:ssh: 9.0p1
  - cmd:system-git: 2.41.0
  - cmd:system-ssh: 9.0p1
  - humanize: 4.8.0
  - iso8601: 2.0.0
  - keyring: 24.2.0
  - keyrings.alt: UNKNOWN
  - msgpack: 1.0.5
  - platformdirs: 3.10.0
  - requests: 2.31.0

HPC:

  - annexremote: 1.6.0
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 10.20230408-g5b1e8ba77
  - cmd:bundled-git: 2.40.1
  - cmd:git: 2.40.1
  - cmd:ssh: 8.4p1
  - cmd:system-git: 2.41.0
  - cmd:system-ssh: 8.4p1
  - humanize: 4.8.0
  - iso8601: 2.0.0
  - keyring: 24.2.0
  - keyrings.alt: UNKNOWN
  - msgpack: 1.0.5
  - platformdirs: 3.10.0
  - requests: 2.31.0

I tried on another HPC I have access to (also geographically near to me) and had no problems downloading.

mckenziephagen commented 1 year ago

SOLVED IT.

There were old credentials hanging around in /.local/share/python_keyring/keyring_pass.cfg

I deleted them and the next time I ran datalad download-url I got prompted for new credentials, which ran flawlessly.

Thanks for all of your help!