datalad / datalad-ukbiobank

Resources for working with UKBiobank as a DataLad dataset
MIT License
6 stars 12 forks source link

AnnexBatchCommandError: 'addurl' [Error, annex reported failure for addurl #75

Closed m-petersen closed 2 years ago

m-petersen commented 3 years ago

Hi,

when I try to download and bidsify a subsample of UKB subjects on the head node of our HPC (has internet connection) with datalad ukb an error occurs. Interestingly, executing the same set of commands locally works flawlessly. I also had a similar error when trying to establish datalad-hirni (https://github.com/psychoinformatics-de/datalad-hirni/issues/201). Maybe something is wrong with my environment

The error:

[INFO   ] Initiating special remote datalad-archives 
AnnexBatchCommandError: 'addurl' [Error, annex reported failure for addurl (url='dl+archive:MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip#path=fMRI/unusable/rfMRI_SBREF.nii.gz&size=801230'): {'command': 'addurl', 'note': 'from datalad-archives\nto 20227_2_0/fMRI/unusable/rfMRI_SBREF.nii.gz', 'success': False, 'input': ['dl+archive:MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip#path=fMRI/unusable/rfMRI_SBREF.nii.gz&size=801230 20227_2_0/fMRI/unusable/rfMRI_SBREF.nii.gz'], 'error-messages': ["  Failed to fetch any archive containing URL-s801230--dl,43archive:MD5E-s348562673--4-d47d48693f84afad33301a3ae2467f14. Tried: ['MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip', 'MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip', 'MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip'] [archives.py:_transfer:407]"], 'file': '20227_2_0/fMRI/unusable/rfMRI_SBREF.nii.gz'}]

The whole output:

+ '[' -d 'sub-5088058/ses*' ']'
+ datalad create sub-5088058
[INFO   ] Creating a new annex repo at /work/fatx405/projects/BIDS_UKB/sub-5088058 
[INFO   ] scanning for unlocked files (this may take some time) 
create(ok): /work/fatx405/projects/BIDS_UKB/sub-5088058 (dataset)
+ pushd sub-5088058
/work/fatx405/projects/BIDS_UKB/sub-5088058 /work/fatx405/projects/BIDS_UKB /work/fatx405/projects/BIDS_UKB
+ datalad ukb-init --bids 5088058 20227_2_0 20227_3_0 20250_2_0 20250_3_0 20252_2_0 20252_3_0 20253_2_0 20253_3_0
ukb_init(ok): . (dataset)                          
+ datalad ukb-update --keyfile /work/fatx405/projects/BIDS_UKB/k71359r46151.key --merge --drop extracted
[INFO   ] == Command start (output follows) ===== 

ukbfetch on unx - ver Jan 30 2019 15:39:51 - using Glibc2.17(stable)
Run start : 2021-08-11T20:43:52 
Verbose mode activated
Registering repository "biota.ndph.ox.ac.uk"
Registering repository "chest.ndph.ox.ac.uk"
UsrNm: fatx405
AppID: 71359
Loaded 8 lines from ".ukbbatch"
Request(1) for EncID:5088058, Field:20227, Instance:2, Array:0
Contacting "chest.ndph.ox.ac.uk"
348672958 bytes fetched
Download has been logged against IP address 134.100.32.114
Unpacking 348672346 -> 348562673 ... done 348562673 bytes
Opening output file "ukb1_1628707432_1227.tmp_bulk"...
348562673 bytes written
Renaming tmp file "ukb1_1628707432_1227.tmp_bulk" to output file "5088058_20227_2_0.zip"...
Opening output listfile ".git/tmp/ukb.lis"
Created 5088058_20227_2_0.zip
Request(2) for EncID:5088058, Field:20227, Instance:3, Array:0
Contacting "chest.ndph.ox.ac.uk"
323 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20227/Instance=3/Array=0

Contacting "biota.ndph.ox.ac.uk"
343 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20227/Instance=3/Array=0

Download failure
Request(3) for EncID:5088058, Field:20250, Instance:2, Array:0
Contacting "chest.ndph.ox.ac.uk"
323 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20250/Instance=2/Array=0

Contacting "biota.ndph.ox.ac.uk"
343 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20250/Instance=2/Array=0

Download failure
Request(4) for EncID:5088058, Field:20250, Instance:3, Array:0
Contacting "biota.ndph.ox.ac.uk"
343 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20250/Instance=3/Array=0

Contacting "chest.ndph.ox.ac.uk"
323 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20250/Instance=3/Array=0

Download failure
Request(5) for EncID:5088058, Field:20252, Instance:2, Array:0
Contacting "biota.ndph.ox.ac.uk"
50668109 bytes fetched
Download has been logged against IP address 134.100.32.114
Unpacking 50667481 -> 50659551 ... done 50659551 bytes
Opening output file "ukb5_1628707485_1227.tmp_bulk"...
50659551 bytes written
Renaming tmp file "ukb5_1628707485_1227.tmp_bulk" to output file "5088058_20252_2_0.zip"...
Created 5088058_20252_2_0.zip
Request(6) for EncID:5088058, Field:20252, Instance:3, Array:0
Contacting "biota.ndph.ox.ac.uk"
343 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20252/Instance=3/Array=0

Contacting "chest.ndph.ox.ac.uk"
323 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20252/Instance=3/Array=0

Download failure
Request(7) for EncID:5088058, Field:20253, Instance:2, Array:0
Contacting "biota.ndph.ox.ac.uk"
34576101 bytes fetched
Download has been logged against IP address 134.100.32.114
Unpacking 34575473 -> 34564840 ... done 34564840 bytes
Opening output file "ukb7_1628707503_1227.tmp_bulk"...
34564840 bytes written
Renaming tmp file "ukb7_1628707503_1227.tmp_bulk" to output file "5088058_20253_2_0.zip"...
Created 5088058_20253_2_0.zip
Request(8) for EncID:5088058, Field:20253, Instance:3, Array:0
Contacting "biota.ndph.ox.ac.uk"
343 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20253/Instance=3/Array=0

Contacting "chest.ndph.ox.ac.uk"
323 bytes fetched
Download has been logged against IP address 134.100.32.114
Error: Bulk data not present for  Encoded_id=5088058 Field=20253/Instance=3/Array=0

Download failure
Fetched 3/8 datafiles
Run end : 2021-08-11T20:45:26
[INFO   ] == Command exit (modification check follows) ===== 
[INFO   ] Adding content of the archive MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip into annex AnnexRepo(/work/fatx405/projects/BIDS_UKB/sub-5088058) 
[INFO   ] Initiating special remote datalad-archives 
AnnexBatchCommandError: 'addurl' [Error, annex reported failure for addurl (url='dl+archive:MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip#path=fMRI/unusable/rfMRI_SBREF.nii.gz&size=801230'): {'command': 'addurl', 'note': 'from datalad-archives\nto 20227_2_0/fMRI/unusable/rfMRI_SBREF.nii.gz', 'success': False, 'input': ['dl+archive:MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip#path=fMRI/unusable/rfMRI_SBREF.nii.gz&size=801230 20227_2_0/fMRI/unusable/rfMRI_SBREF.nii.gz'], 'error-messages': ["  Failed to fetch any archive containing URL-s801230--dl,43archive:MD5E-s348562673--4-d47d48693f84afad33301a3ae2467f14. Tried: ['MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip', 'MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip', 'MD5E-s348562673--4e8652e17e5570f4dc4da0722e0bd53e.zip'] [archives.py:_transfer:407]"], 'file': '20227_2_0/fMRI/unusable/rfMRI_SBREF.nii.gz'}]
+ popd

The script I am executing

#!/bin/bash

#source activate datalad
ROOT_DIR=$(realpath .)
KEY=$ROOT_DIR/key
export PATH=$ROOT_DIR/:$PATH

pushd $ROOT_DIR
for sub in $(cat ukb_matched_subjects.txt);do
    [ -d sub-${sub}/ses* ] && continue
    datalad create sub-${sub}; pushd sub-${sub}
    datalad ukb-init --bids $sub 20227_2_0 20227_3_0 20250_2_0 20250_3_0 20252_2_0 20252_3_0 20253_2_0 20253_3_0
    datalad ukb-update --keyfile $KEY --merge --drop extracted
    popd

done
popd

Datalad wtf output

datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials 
  - keyring: 
    - active_backends: 
      - PlaintextKeyring with no encyption v.1.0 at /home/fatx405/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /home/fatx405/.config/python_keyring/keyringrc.cfg
    - data_root: /home/fatx405/.local/share/python_keyring
## datalad 
  - full_version: 0.14.6
  - version: 0.14.6
## dependencies 
  - annexremote: 1.5.0
  - appdirs: 1.4.4
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 8.20201104-g13bab4f2c
  - cmd:bundled-git: 2.29.2
  - cmd:git: 2.29.2
  - cmd:system-git: 2.29.2
  - cmd:system-ssh: 7.4p1
  - humanize: 3.10.0
  - iso8601: 0.1.14
  - keyring: 23.0.1
  - keyrings.alt: 4.0.2
  - msgpack: 1.0.2
  - requests: 2.25.1
  - wrapt: 1.12.1
## environment 
  - LANG: en_US.UTF-8
  - PATH: /work/fatx405/miniconda3/envs/datalad/bin:/work/fatx405/miniconda3/condabin:/work/fatx405/miniconda3/bin:/sw/link/git/2.32.0/bin:/sw/env/system-gcc/singularity/3.5.2-overlayfix/bin:/sw/batch/slurm/19.05.6/bin:/sw/rrz/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
## extensions 
  - container: 
    - description: Containerized environments
    - entrypoints: 
      - datalad_container.containers_add.ContainersAdd: 
        - class: ContainersAdd
        - load_error: None
        - module: datalad_container.containers_add
        - names: 
          - containers-add
          - containers_add
      - datalad_container.containers_list.ContainersList: 
        - class: ContainersList
        - load_error: None
        - module: datalad_container.containers_list
        - names: 
          - containers-list
          - containers_list
      - datalad_container.containers_remove.ContainersRemove: 
        - class: ContainersRemove
        - load_error: None
        - module: datalad_container.containers_remove
        - names: 
          - containers-remove
          - containers_remove
      - datalad_container.containers_run.ContainersRun: 
        - class: ContainersRun
        - load_error: None
        - module: datalad_container.containers_run
        - names: 
          - containers-run
          - containers_run
    - load_error: None
    - module: datalad_container
    - version: 1.1.5
  - metalad: 
    - description: DataLad semantic metadata command suite
    - entrypoints: 
      - datalad_metalad.aggregate.Aggregate: 
        - class: Aggregate
        - load_error: None
        - module: datalad_metalad.aggregate
        - names: 
          - meta-aggregate
          - meta_aggregate
      - datalad_metalad.dump.Dump: 
        - class: Dump
        - load_error: None
        - module: datalad_metalad.dump
        - names: 
          - meta-dump
          - meta_dump
      - datalad_metalad.extract.Extract: 
        - class: Extract
        - load_error: None
        - module: datalad_metalad.extract
        - names: 
          - meta-extract
          - meta_extract
    - load_error: None
    - module: datalad_metalad
    - version: 0.2.1
  - neuroimaging: 
    - description: Neuroimaging tools
    - entrypoints: 
      - datalad_neuroimaging.bids2scidata.BIDS2Scidata: 
        - class: BIDS2Scidata
        - load_error: None
        - module: datalad_neuroimaging.bids2scidata
        - names: 
          - bids2scidata
    - load_error: None
    - module: datalad_neuroimaging
    - version: 0.3.1
  - ukbiobank: 
    - description: UKBiobank dataset support
    - entrypoints: 
      - datalad_ukbiobank.init.Init: 
        - class: Init
        - load_error: None
        - module: datalad_ukbiobank.init
        - names: 
          - ukb-init
          - ukb_init
      - datalad_ukbiobank.update.Update: 
        - class: Update
        - load_error: None
        - module: datalad_ukbiobank.update
        - names: 
          - ukb-update
          - ukb_update
    - load_error: None
    - module: datalad_ukbiobank
    - version: 0.3.3
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
    - S3
    - WebDAV
  - dependency versions: 
    - aws-0.22
    - bloomfilter-2.0.1.0
    - cryptonite-0.26
    - DAV-1.3.4
    - feed-1.3.0.1
    - ghc-8.8.4
    - http-client-0.6.4.1
    - persistent-sqlite-2.10.6.2
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.1.0
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
    - X*
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - httpalso
    - hook
    - external
  - supported repository versions: 
    - 8
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
  - version: 8.20201104-g13bab4f2c
## location 
  - path: /work/fatx405/projects/BIDS_UKB
  - type: directory
## metadata_extractors 
  - annex (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: None
    - module: datalad.metadata.extractors.annex
    - version: None
  - audio (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: No module named 'mutagen' [audio.py:<module>:17]
    - module: datalad.metadata.extractors.audio
  - bids (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.bids
    - version: None
  - datacite (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: None
    - module: datalad.metadata.extractors.datacite
    - version: None
  - datalad_core (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: None
    - module: datalad.metadata.extractors.datalad_core
    - version: None
  - datalad_rfc822 (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: None
    - module: datalad.metadata.extractors.datalad_rfc822
    - version: None
  - dicom (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.dicom
    - version: None
  - exif (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: No module named 'exifread' [exif.py:<module>:16]
    - module: datalad.metadata.extractors.exif
  - frictionless_datapackage (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: None
    - module: datalad.metadata.extractors.frictionless_datapackage
    - version: None
  - image (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: No module named 'PIL' [image.py:<module>:16]
    - module: datalad.metadata.extractors.image
  - metalad_annex (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.annex
    - version: None
  - metalad_core (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.core
    - version: None
  - metalad_custom (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.custom
    - version: None
  - metalad_runprov (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.runprov
    - version: None
  - nidm (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.nidm
    - version: None
  - nifti1 (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.nifti1
    - version: None
  - xmp (datalad 0.14.6): 
    - distribution: datalad 0.14.6
    - load_error: No module named 'libxmp' [xmp.py:<module>:20]
    - module: datalad.metadata.extractors.xmp
## metadata_indexers 
## python 
  - implementation: CPython
  - version: 3.8.1
## system 
  - distribution: centos/7/Core
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 287
  - name: Linux
  - release: 4.14.240-1.0.33.el7.rrz.x86_64
  - type: posix
  - version: #1 SMP Thu Jul 22 18:29:43 CEST 2021

As always grateful for any input and happy to provide further details.

Cheers, Marvin

mih commented 2 years ago

Oh hey! I just saw that this post never got an answer. Sorry!

Is this still an issue?

m-petersen commented 2 years ago

Hi Michael,

no worries. No it's not as we aren't using datalad anymore for the UKB download / bidsification.

Best, Marvin