datalad-datasets / human-connectome-project-openaccess

WU-Minn HCP1200 Data: 3T/7T MR scans from young healthy adults twins and non-twin siblings (ages 22-35) [T1w, T2w, resting-state and task fMRI, high angular resolution dMRI]
https://db.humanconnectome.org/data/projects/HCP_1200
37 stars 6 forks source link

Potentially unavailable files after November 2021 update #32

Open adswa opened 2 years ago

adswa commented 2 years ago

I have been updating the subsampled datasets that derive from this large dataset and can also be found under this organization. This brought to light that there are a number of files in the dataset that can't be retrieved. This number is quite small compared to the overall number of files, but worthy of investigating. We should make sure that these files indeed were removed from the bucket and remove them from the datasets too, or, if they actually are available, figure out what went wrong with adding their URLs.

There is a single one in https://github.com/datalad-datasets/hcp_smoothedmyelin/pull/2 (this is a newly added file):

549757/MNINonLinear/fsaverage_LR32k/549757.R.SmoothedMyelinMap.32k_fs_LR.func.gii

In https://github.com/datalad-datasets/hcp-functional-connectivity there seem to be some systematic failures:

118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/Movement_AbsoluteRMS.txt
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/Movement_AbsoluteRMS_mean.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/Movement_Regressors.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/Movement_Regressors_dt.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/Movement_RelativeRMS.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/Movement_RelativeRMS_mean.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/rfMRI_REST1_7T_PA_Atlas_stats.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/rfMRI_REST1_7T_PA_CSF.txt 
118225/MNINonLinear/Results/rfMRI_REST1_7T_PA/rfMRI_REST1_7T_PA_WM.txt
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/Movement_AbsoluteRMS.txt 
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/Movement_AbsoluteRMS_mean.txt
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/Movement_Regressors.txt
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/Movement_Regressors_dt.txt 
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/Movement_RelativeRMS.txt
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/Movement_RelativeRMS_mean.txt
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/rfMRI_REST2_7T_AP_Atlas_stats.txt 
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/rfMRI_REST2_7T_AP_CSF.txt
118225/MNINonLinear/Results/rfMRI_REST2_7T_AP/rfMRI_REST2_7T_AP_WM.txt
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/Movement_AbsoluteRMS.txt
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/Movement_AbsoluteRMS_mean.txt 
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/Movement_Regressors.txt
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/Movement_Regressors_dt.txt
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/Movement_RelativeRMS.txt 
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/Movement_RelativeRMS_mean.txt
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/rfMRI_REST3_7T_PA_Atlas_stats.txt 
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/rfMRI_REST3_7T_PA_CSF.txt
118225/MNINonLinear/Results/rfMRI_REST3_7T_PA/rfMRI_REST3_7T_PA_WM.txt 
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/Movement_AbsoluteRMS.txt 
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/Movement_AbsoluteRMS_mean.txt 
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/Movement_Regressors.txt
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/Movement_Regressors_dt.txt
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/Movement_RelativeRMS.txt 
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/Movement_RelativeRMS_mean.txt 
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/rfMRI_REST4_7T_AP_Atlas_stats.txt 
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/rfMRI_REST4_7T_AP_CSF.txt
118225/MNINonLinear/Results/rfMRI_REST4_7T_AP/rfMRI_REST4_7T_AP_WM.txt 

I see this pattern of files without a known copy for a few, but not all subjects in the dataset. A few example subjects where the failure occurs are 118225 and 782561. The subjects where I don't see it do not seem to contain these files in the first place. One example is subject 987074. Does any of this ring a bell? Ping for awareness @mih @loj

loj commented 2 years ago

I feel like this might be a problem specifically with the newly added 7T data. I'm seeing the same thing for some other randomly selected subjects with 7T data 100610, 148133:

<sub-id>/MNINonLinear/Results/tfMRI_MOVIE1_7T_AP/*.txt

adswa commented 2 years ago

This is unrelated to the 7T data, but we just found 7 subjects with a similar error pattern:

{"url":"HCP_1200/212419/MNINonLinear/Results/tfMRI_RELATIONAL_LR/tfMRI_RELATIONAL_LR.nii.gz","size":194996822,"lastmodified":"2018-08-21T01:13:13.000Z","target":"MNINonLinear//Results/tfMRI_RELATIONAL_LR/tfMRI_RELATIONAL_LR.nii.gz"}
{"url":"HCP_1200/401422/MNINonLinear/Results/tfMRI_SOCIAL_RL/tfMRI_SOCIAL_RL_Atlas.dtseries.nii","size":100679392,"lastmodified":"2018-08-22T09:35:39.000Z","target":"MNINonLinear//Results/tfMRI_SOCIAL_RL/tfMRI_SOCIAL_RL_Atlas.dtseries.nii"}
{"url":"HCP_1200/638049/MNINonLinear/Results/rfMRI_REST1_RL/rfMRI_REST1_RL.R.native.func.gii","size":731096961,"lastmodified":"2018-08-23T13:44:52.000Z","target":"MNINonLinear//Results/rfMRI_REST1_RL/rfMRI_REST1_RL.R.native.func.gii"}
{"url":"HCP_1200/884064/unprocessed/3T/tfMRI_MOTOR_RL/884064_3T_tfMRI_MOTOR_RL.nii.gz","size":298915217,"lastmodified":"2018-08-25T01:07:22.000Z","target":"unprocessed//3T/tfMRI_MOTOR_RL/884064_3T_tfMRI_MOTOR_RL.nii.gz"}
{"url":"HCP_1200/886674/MNINonLinear/Results/tfMRI_LANGUAGE_LR/tfMRI_LANGUAGE_LR.R.native.func.gii","size":226305504,"lastmodified":"2018-08-25T01:24:59.000Z","target":"MNINonLinear//Results/tfMRI_LANGUAGE_LR/tfMRI_LANGUAGE_LR.R.native.func.gii"}
{"url":"HCP_1200/894067/MNINonLinear/Results/tfMRI_EMOTION_RL/tfMRI_EMOTION_RL.R.native.func.gii","size":121684503,"lastmodified":"2018-08-25T02:29:23.000Z","target":"MNINonLinear//Results/tfMRI_EMOTION_RL/tfMRI_EMOTION_RL.R.native.func.gii"}
{"url":"HCP_1200/901139/MNINonLinear/Results/rfMRI_REST2_RL/rfMRI_REST2_RL.L.native.func.gii","size":735529336,"lastmodified":"2018-08-25T04:13:08.000Z","target":"MNINonLinear//Results/rfMRI_REST2_RL/rfMRI_REST2_RL.L.native.func.gii"}

The problem was a silent and undetected failure of addurls (more to the story is in #33)

mattcieslak commented 2 months ago

Sorry to reply to an old issue - do you know if all the data in these subdatasets have available copies somewhere?

mih commented 2 months ago

Yes. For example at https://hub.datalad.org/hcp-openaccess

mattcieslak commented 2 months ago

I meant the data within these datasets. I'm seeing a lot of

Results/tfMRI_RETCW_7T_PA/Movement_AbsoluteRMS.txt (file) [not available; (Note that these git remotes have annex-ignore set: origin)]

messages when trying to datalad get some of these.