Closed mckenziephagen closed 1 year ago
Smells like something happened to let's in that bucket... Needs a closer look
someone should check with HCP folks on what have they done with the bucket. On that sample key:
❯ datalad ls -aL s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt
Connecting to bucket: hcp-openaccess
[INFO ] S3 session: Connecting to the bucket hcp-openaccess with authentication
Bucket info:
Versioning: {'Versioning': 'Enabled'}
Website: hcp-openaccess.s3-website-us-east-1.amazonaws.com
ACL: S3ResponseError: 403 Forbidden
HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt 2018-08-18T15:24:40.000Z 1693 ver:None acl:<Policy: ccf-aws (owner) = FULL_CONTROL> http://hcp-openaccess.s3.amazonaws.com/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt [E: 403]
datalad download-url
❯ datalad download-url s3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt
[INFO ] Downloading 's3://hcp-openaccess/HCP_1200/171330/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_Atlas_stats.txt' into '/tmp/'
[INFO ] S3 session: Connecting to the bucket hcp-openaccess with authentication
download_url(ok): /tmp/rfMRI_REST1_LR_Atlas_stats.txt (file)
I have tried the
datalad update --merge -r
adswa suggested, but that didn't change anything.
We would need to do some magic pixie dust sprinkling to fix up, extend list of URLs for each file after we could first determine what has happened so we would get a clue on longevity of current setup. In general S3 URLs without versionIds are inferior since then content could be changed for the same URL.
What would the timeframe on a fix be? Or is there any workaround so that I'd be able to download the data?
Thanks!
looked into it again. realized that we are now looking at hcp-functional-connectivity
and not original https://github.com/datalad-datasets/human-connectome-project-openaccess/ . I looked at how the issue was "worked around" in that original one - @mih just provided URLs without versionIds for all the keys. It does provide immediate remedy but I would prefer to avoid that. That is why
jq
and run such command in your terminal in that fresh clone:git annex whereis --json | jq -r '.key + " " + (.whereis[] | select(.urls[] | startswith("s3://hcp-openaccess")) | .urls[])'| sed -e 's,\?.*,,g' | git annex registerurl --batch
which would populate git-annex with URLs while stripping versionIds. Note that it would take awhile. I am running it too and might later share that adjusted git-annex branch in a clone
Amazing, thank you! How long is "awhile" - hours or days?
on my laptop it is still running for an hour or so. it is 72113 files, does about 6 files per second, so about 200 minutes, thus about 3 hours.
So, a few things.
You noted that datalad download-url
works for you with the s3 key without the versionid, but I'm not able to replicate that. I also get an error informing me that datalad ls
belongs to the package datalad-deprecated
, so that might be related. I AM able to download this file fine with aws s3 cp
.
fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> datalad download-url s3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz
[INFO ] Downloading 's3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz' into '/pscratch/sd/m/mphagen/data/hcp-functional-connectivity/'
[INFO ] S3 session: Connecting to the bucket hcp-openaccess with authentication
download_url(error): /pscratch/sd/m/mphagen/data/hcp-functional-connectivity/ (file) [TargetFileAbsent(S3 refused to provide the key for HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz from url s3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz)] [S3 refused to provide the key for HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz from url s3://hcp-openaccess/HCP_1200/299154/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz]
When I tried to run the jq command listed above in the functional connectivity dataset, I got output that seemed like a success, but didn't lead to me being able to datalad get
anything.
Here's a chunk of the output from the jq command - something to note is that these are ALL text or json files. No other filetype.
(fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> git annex whereis --json | jq -r '.key + " " + (.whereis[] | select(.urls[] | startswith("s3://hcp-openaccess")) | .urls[])'| sed -e 's,\?.*,,g' | git annex registerurl --batch
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
registerurl s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS_mean.txt ok
I tried in a fresh clone of both the functional connectivity and full HCP datalad dataset, but got the same results - in the below chunk of code I try to get one of the files that supposedly had the URL fixed in git-annex.
fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> datalad get 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt
get(error): 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt (file) [Failed to download from any of 2 locations ['S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt -caused by- S3ResponseError: 403 Forbidden\n', 'S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR -caused by- S3ResponseError: 403 Forbidden\n']
Failed to download from any of 2 locations ['S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt -caused by- S3ResponseError: 403 Forbidden\n', 'S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR -caused by- S3ResponseError: 403 Forbidden\n']
Failed to download from any of 2 locations ['S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt -caused by- S3ResponseError: 403 Forbidden\n', 'S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR -caused by- S3ResponseError: 403 Forbidden\n']]
and the output of git-annex whereis
for that file:
(fc) mphagen@login05:/pscratch/sd/m/mphagen/data/hcp-functional-connectivity> git annex whereis 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt
whereis 100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt (1 copy)
5435893f-9dce-4098-98ec-82d825391bbd -- [datalad]
datalad: s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt
datalad: s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
ok
And last, the HCP folks responded in the Google Group. They said that they had fixed the s3 policy to allow versionids. I'm still getting an s3 error when I try to download keys that have versionids, so I've followed up with them on that. Posting a link here for posterity.
In sum - my datalad download-url
doesn't download s3 keys without versionids like you demonstrated above, the versionids seemingly aren't being properly stripped by the jq command, and the HCP folks have done something to the s3 policy that gives me a new error.
Since it seems like the s3 policy might be resolved, the jq issue probably isn't something that needs to be solved. But, if you have any intuition about why datalad download-url
wouldn't be working for me, that would be great. I'm making the assumptions that datalad download-url
is somehow implicated in datalad get
, and therefor, important to get to the bottom of why it's misbehaving.
Thanks for bearing with me - I appreciate your expertise!
indeed they seems have fixed permissions, and I was able to initiate download a sample subject just fine:
$> datalad clone https://github.com/datalad-datasets/hcp-functional-connectivity hcp-functional-connectivity-2
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://github.com/datalad-datasets/hcp-functional-connectivity/config download failed: Not Found
install(ok): /mnt/btrfs/datasets/datalad/crawl-misc/hcp/hcp-functional-connectivity-2 (dataset)
datalad clone https://github.com/datalad-datasets/hcp-functional-connectivity 5.43s user 3.95s system 20% cpu 46.769 total
$> cd hcp-functional-connectivity-2
$> datalad get -J4 176037
Total: 42%|██████████████████████████████████████████████████████████████████████████████████████▎ | 3.33G/7.90G [06:00<08:14, 9.24M Bytes/s]
so permissions issue seems to be addressed. Do you observe something different?
@mckenziephagen please check and report back on either it works for you now.
It's working on my local laptop, but still erroring on my HPC (Perlmutter at NERSC, if that's helpful information). I've tried uninstalling and reinstalling Datalad and aws in a new conda environment and recloning both the functional data sub-dataset and the whole HCP dataset. The results are consistent - S3 forbidden error when trying datalad get
a file, and "TargetFileAbsent" when I try datalad download-url
.
I have the same aws credentials set on my local and HPC, and the same datalad version. The git-annex versions are slightly different (8.20211231 on my local, and 10.20230626 on HPC). One additional difference is that I didn't try to download the dataset on my laptop until after the key policy issue was fixed.
on HPC -- were you asked by datalad get
or datalad download-url
the credential?
what is the output of datalad -l 5 download-url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
? (but be mindful to not share if you spot your secret in there -- but it should be made visible AFAIK). You might want to try to set key and secret ids via env vars DATALAD_hcp_s3_key_id and DATALAD_hcp_s3_secret_id in the next attempt.
I think I was probably asked the very first time I tried a couple of days ago, but not since. I know I was asked the first time I ran datalad get
on my laptop, because that was more recent. I have reset the credentials once or twice while trying to debug, but the ones currently set in the aws config file are correct.
The error before and after setting DATALAD_hcp_s3_key_id and DATALAD_hcp_s3_secret_id with my credentials is the same:
(lad-env) mphagen@perlmutter:login31:/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200> datalad -l 5 download-url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
[**DEBUG** ] Command line args 1st pass for DataLad 0.19.3. Parsed: Namespace() Unparsed: ['download-url', 's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR']
[Level 5] Importing module datalad.local.download_url
[**DEBUG** ] Building doc for <class 'datalad.core.local.status.Status'>
[**DEBUG** ] Building doc for <class 'datalad.core.local.save.Save'>
[**DEBUG** ] Building doc for <class 'datalad.local.download_url.DownloadURL'>
[Level 5] Finished setup_parser
[**DEBUG** ] Parsing known args among ['/global/homes/m/mphagen/miniconda3/envs/lad-env/bin/datalad', '-l', '5', 'download-url', 's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR']
[**DEBUG** ] Determined class of decorated function: <class 'datalad.local.download_url.DownloadURL'>
[Level 5] Parsed ri /pscratch/sd/m/mphagen/human-connectome-project-openaccess into fields {'path': '/pscratch/sd/m/mphagen/human-connectome-project-openaccess', 'scheme': '', 'netloc': '', 'username': '', 'password': '', 'hostname': '', 'port': '', 'query': '', 'fragment': ''}
[Level 5] Detected file ri
[**DEBUG** ] Resolved dataset to download urls: /pscratch/sd/m/mphagen/human-connectome-project-openaccess
[Level 5] Requested to provide version for cmd:git
[Level 5] Requested to provide version for cmd:git
[Level 5] Found 0 previous connections
[**DEBUG** ] Run ['git', '-c', 'diff.ignoreSubmodules=none', '-c', 'core.quotepath=false', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] (protocol_class=GeneratorStdOutErrCapture) (cwd=/pscratch/sd/m/mphagen/human-connectome-project-openaccess)
[Level 8] Process 1555111 started
[Level 5] STDERR: git -c diff.ignoreSu (Thread<(STDERR: git -c diff.ignoreSu, 6)>) started
[Level 5] STDOUT: git -c diff.ignoreSu (Thread<(STDOUT: git -c diff.ignoreSu, 4)>) started
[Level 5] process_waiter (Thread<process_waiter>) started
[Level 5] STDOUT: git -c diff.ignoreSu (Thread<(STDOUT: git -c diff.ignoreSu, 4)>) exiting (exit_requested: False, last data: None)
[Level 5] STDERR: git -c diff.ignoreSu (Thread<(STDERR: git -c diff.ignoreSu, 6)>) exiting (exit_requested: False, last data: None)
[Level 5] process_waiter (Thread<process_waiter>) exiting
[Level 5] Detected <class 'datalad.support.gitrepo.GitRepo'> at /pscratch/sd/m/mphagen/human-connectome-project-openaccess
[**DEBUG** ] Assigning credentials into 21 providers
[**DEBUG** ] Returning provider Provider(authenticator=<<S3Authenticato++27 chars++one)>>, credential=<<AWS_S3(ds=<<Da++82 chars++'>>)>>, name='hcp-s3', url_res=['s3://hcp-openaccess.*']) for url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
[Level 5] Requested to provide version for boto
[INFO ] Downloading 's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR' into '/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/'
[**DEBUG** ] set credential context as s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
[**DEBUG** ] Acquiring a currently existing lock to establish download session. If stalls - check which process holds /global/homes/m/mphagen/.cache/datalad/locks/downloader-auth.lck
[**DEBUG** ] S3 session: Reconnecting to the bucket
[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_key_id
[**DEBUG** ] Importing keyring
[**DEBUG** ] Loading KWallet
[**DEBUG** ] Loading SecretService
[**DEBUG** ] Loading Windows
[**DEBUG** ] Loading chainer
[**DEBUG** ] Loading libsecret
[**DEBUG** ] Loading macOS
[**DEBUG** ] Loading Gnome
[**DEBUG** ] Loading Google
[**DEBUG** ] Loading Windows (alt)
[**DEBUG** ] Loading file
[**DEBUG** ] Loading keyczar
[**DEBUG** ] Loading multi
[**DEBUG** ] Loading pyfs
[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_secret_id
[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_session
[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_expiration
[INFO ] S3 session: Connecting to the bucket hcp-openaccess with authentication
[Level 8] S3ResponseError: 403 Forbidden
[s3.py:get_bucket:134,utils.py:_wrap_try_multiple_dec:2181,connection.py:get_bucket:509,connection.py:head_bucket:542]
[**DEBUG** ] Cannot access bucket hcp-openaccess by name with validation: S3ResponseError(S3ResponseError: 403 Forbidden
)
[Level 5] Calling out into <bound method BaseDownloader._download of S3Downloader(authenticator=<<S3Authenticato++56 chars++com)>>, credential=<<AWS_S3(auth_ur++98 chars++'>>)>>)> for s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR
[Level 5] Credentials lookup attempt via env var DATALAD_hcp_s3_expiration
[Level 8] S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR [download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231]
**download_url**(**error**): /pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/ (**file**) [TargetFileAbsent(S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR)] **[S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR]**
[Level 8] Command did not complete successfully. 1 failed:
[{'action': 'download_url',
'error_message': 'S3 refused to provide the key for '
'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '
'from url '
's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR',
'exception': S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR [download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231],
'exception_traceback': '[download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231]',
'message': 'TargetFileAbsent(S3 refused to provide the key for '
'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '
'from url '
's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR)',
'path': '/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/',
'status': 'error',
'type': 'file'}] [main.py:_run_with_exception_handler:190,exec.py:call_from_parser:107,base.py:_execute_command_:940]
[**DEBUG** ] could not perform all requested actions: IncompleteResultsError(Command did not complete successfully. 1 failed:
[{'action': 'download_url',
'error_message': 'S3 refused to provide the key for '
'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '
'from url '
's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR',
'exception': S3 refused to provide the key for HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt from url s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR [download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231],
'exception_traceback': '[download_url.py:__call__:202,base.py:download:533,base.py:access:174,base.py:_download:433,s3.py:get_downloader_session:354,s3.py:get_downloader_session:350,utils.py:_wrap_try_multiple_dec:2181,s3.py:_get_key:246,bucket.py:get_key:193,bucket.py:_get_key_internal:231]',
'message': 'TargetFileAbsent(S3 refused to provide the key for '
'HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt '
'from url '
's3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Results/rfMRI_REST1_LR/Movement_AbsoluteRMS.txt?versionId=62BxN3kGUa7qki1A4zbjiCEyKbMnVITR)',
'path': '/pscratch/sd/m/mphagen/human-connectome-project-openaccess/HCP1200/',
'status': 'error',
'type': 'file'}])
[Level 5] Exiting
please check boto versions on hpc and locally. datalad wtf -S dependencies
could tell those.
is HPC in about the same geo location? I wonder if somehow crossing some boundaries/zones would matter...
I'm in the same city as the HPC.
datalad wtf -S dependencies
returns boto versions 2.49.0 for both.
local:
- annexremote: 1.2.1
- boto: 2.49.0
- cmd:7z: 16.02
- cmd:annex: 8.20211231
- cmd:bundled-git: UNKNOWN
- cmd:git: 2.41.0
- cmd:ssh: 9.0p1
- cmd:system-git: 2.41.0
- cmd:system-ssh: 9.0p1
- humanize: 4.8.0
- iso8601: 2.0.0
- keyring: 24.2.0
- keyrings.alt: UNKNOWN
- msgpack: 1.0.5
- platformdirs: 3.10.0
- requests: 2.31.0
HPC:
- annexremote: 1.6.0
- boto: 2.49.0
- cmd:7z: 16.02
- cmd:annex: 10.20230408-g5b1e8ba77
- cmd:bundled-git: 2.40.1
- cmd:git: 2.40.1
- cmd:ssh: 8.4p1
- cmd:system-git: 2.41.0
- cmd:system-ssh: 8.4p1
- humanize: 4.8.0
- iso8601: 2.0.0
- keyring: 24.2.0
- keyrings.alt: UNKNOWN
- msgpack: 1.0.5
- platformdirs: 3.10.0
- requests: 2.31.0
I tried on another HPC I have access to (also geographically near to me) and had no problems downloading.
SOLVED IT.
There were old credentials hanging around in /.local/share/python_keyring/keyring_pass.cfg
I deleted them and the next time I ran datalad download-url
I got prompted for new credentials, which ran flawlessly.
Thanks for all of your help!
Hi! I've successfully cloned the dataset following the instructions in the readme, but when I go to "get" a participant's data, I get an error that the s3 key was refused.
If I try to just download it using the aws cli, and the s3 link from that output, I also get an error.
But, if I trim off the trailing alphanumeric junk, then I AM able to download it. So it doesn't seem to be an issue with my s3 keys being set incorrectly.
My version of datalad is 0.19.2.
I was also able to replicate this in the full HCP datalad dataset, so it's not something that's specific to this dataset.
edit:
output of datalad wtf
``` [WARNING] Could not determine filesystem types due to KeyError(None) # WTF ## configuration