Closed prioux closed 3 years ago
@emmetaobrien @prioux I just removed the old URLs from the dataset using a script I created. The old web URLs should be removed now.
(datalad_python3) BigBrain $ git annex whereis 3D_Volumes/Histological_Space/mnc/full16_1000um_optbal.mnc
whereis 3D_Volumes/Histological_Space/mnc/full16_1000um_optbal.mnc (1 copy)
00000000-0000-0000-0000-000000000001 -- web
web: ftp://bigbrain.loris.ca/BigBrainRelease.2015/3D_Volumes/Histological_Space/mnc/full16_1000um_optbal.mnc
ok
I also removed Emmet's remotes using:
git annex dead 9a137376-4816-4301-afc5-63f5d0ecd36f
git annex dead df4f69b6-5f50-4558-a466-4c0b8419de52
So in theory, the only URL you should see now is the correct web URL pointing to the FTP site.
@emmetaobrien I suspect that for many if not all our CONP dataset, there are our own remotes showing up in git annex whereis
...
I will create a separate ticket for clean up of the datasets and we can split them amongst us.
https://github.com/CONP-PCNO/conp-dataset/pull/524 removes unnecessary local setups for Brainspan
, celltypes
, the 3 Khanlab
datasets and the 3 refseq
datasets.
mcgill-emc-rna-seq-experiment
still needs updating, but that is in @zxenia's repository to which I do not have access; also, visual-working-memory
has a non-standard set of locations:
(base) eaobrien@datalad-dev:/data/temp-datasets/emmetaobrien/conp-dataset/projects/visual-working-memory$ git annex whereis sub-01/anat/sub-01_T1w.nii.gz
whereis sub-01/anat/sub-01_T1w.nii.gz (3 copies)
70b2cac8-9c0e-4a11-91d7-6a42162b00cd -- root@af43181574ca:/datalad/ds001634
b152df9c-eb7e-4b80-9311-b021f018fa8a -- [s3-PUBLIC]
bead5592-156d-4587-a634-69e6a75986e0 -- s3-PRIVATE
s3-PUBLIC: http://openneuro.org.s3.amazonaws.com/ds001634/sub-01/anat/sub-01_T1w.nii.gz?versionId=3tBK9WlrojB9h_6CMILUvp3BX7Sa_aSr
ok
This has been fixed so closing the issue.
The datalad registration of the files in the BigBrain Dataset contains erroneous remotes and path. This causes datalad to attempt multiple connections that fail before it can finally get the file contents.
This can be seen by using the 'git annex whereis' command on some files. E.g:
This shows there are two source entries referring to Emmet's own local setups (which are irrelevant to the outside world and should not be published), but also that two paths for the content on the web remote are specified: first with the old WRONG path and then with the new path under
mnc/
.