datalad / datalad-ukbiobank

Resources for working with UKBiobank as a DataLad dataset
MIT License
6 stars 12 forks source link

[ERROR] dataset containing given paths is not underneath the reference dataset #64

Closed fredrmag closed 3 years ago

fredrmag commented 3 years ago

What is the problem?

I am not able to run ukb-update under a directory structure which contains a git repository. I get the following error:

[ERROR] dataset containing given paths is not underneath the reference dataset

However, if the dataset is not under a directory structure which does not contain a git repository, it works.

What steps will reproduce the problem?

// Create a test structure with a git initialized in the top
mkdir test && cd test && git init && mkdir data && cd data
// datalad commands
datalad create 1005393 && cd 1005393
datalad ukb-init -f --bids 1005393 20252_2_0
datalad -c datalad.ukbiobank.keyfile=keyfile ukb-update

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?

version: datalad 0.14 os: ubuntu 20.04

Is there anything else that would be useful to know in this context?

No

Have you had any success using DataLad before? (to assess your expertise/prior luck. We would welcome your testimonial additions to https://github.com/datalad/datalad/wiki/Testimonials as well)

Yes. ukb-update works if the directories above do not have a git repo.

fredrmag commented 3 years ago

Related to: https://github.com/nipy/heudiconv/issues/456 and https://github.com/datalad/datalad/issues/3928

mih commented 3 years ago

Thx for the report. A traceback would be helpful, otherwise a replication attempt needs an environment that has the ukb tolling available. I checked without, and your issue seems to happen later in the code of ukb-update.

fredrmag commented 3 years ago

The command datalad -c datalad.ukbiobank.keyfile=keyfile ukb-update gives the following output:

[ERROR] dataset containing given paths is not underneath the reference dataset Dataset(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459): [PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459')] [status(/tsd/p23/home/p23-fredrmag/no-backup/test-2)] 
[INFO] == Command start (output follows) ===== 
sending incremental file list
1010459_20252_2_0.zip

         32.77K   0%    0.00kB/s    0:00:00  
         52.10M 100%  144.78MB/s    0:00:00 (xfr#1, to-chk=0/1)

sent 52.12M bytes  received 35 bytes  34.74M bytes/sec
total size is 52.10M  speedup is 1.00
[INFO] == Command exit (modification check follows) ===== 

It manages to download the .zip file, but does not manage to convert it to bids.

As the above command does not give a lot of information, I also ran the command with the -l debug flag:

[DEBUG] Command line args 1st pass for DataLad 0.14.0. Parsed: Namespace(cfg_overrides=['datalad.ukbiobank.keyfile=/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key']) Unparsed: ['ukb-update'] 
[DEBUG] Discovering plugins 
[DEBUG] Loading entrypoints 
[DEBUG] Loading entrypoint ukbiobank from datalad.extensions for docs building 
[DEBUG] Loaded entrypoint ukbiobank 
[DEBUG] Building doc for <class 'datalad.interface.add_archive_content.AddArchiveContent'> 
[DEBUG] Building doc for <class 'datalad_ukbiobank.update.Update'> 
[DEBUG] Parsing known args among ['/usr/local/bin/datalad', '-l', 'debug', '-c', 'datalad.ukbiobank.keyfile=/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key', 'ukb-update'] 
[DEBUG] Async run ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] 
[DEBUG] Launching process ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] 
[DEBUG] Process 60511 started 
[DEBUG] Waiting for process 60511 to complete 
[DEBUG] Process 60511 exited with return code 0 
[DEBUG] Determined class of decorated function: <class 'datalad_ukbiobank.update.Update'> 
[DEBUG] Resolved dataset for update: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] Async run ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG] Process 60513 started 
[DEBUG] Waiting for process 60513 to complete 
[DEBUG] Process 60513 exited with return code 0 
[DEBUG] Async run ['git', 'status', '--porcelain', '--untracked-files=normal', '--ignore-submodules=none'] 
[DEBUG] Launching process ['git', 'status', '--porcelain', '--untracked-files=normal', '--ignore-submodules=none'] 
[DEBUG] Process 60515 started 
[DEBUG] Waiting for process 60515 to complete 
[DEBUG] Process 60515 exited with return code 0 
[DEBUG] Async run ['git', 'symbolic-ref', 'HEAD'] 
[DEBUG] Launching process ['git', 'symbolic-ref', 'HEAD'] 
[DEBUG] Process 60526 started 
[DEBUG] Waiting for process 60526 to complete 
[DEBUG] Process 60526 exited with return code 0 
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'incoming^{commit}'] 
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'incoming^{commit}'] 
[DEBUG] Process 60528 started 
[DEBUG] Waiting for process 60528 to complete 
[DEBUG] Process 60528 exited with return code 0 
[DEBUG] Async run ['git', 'checkout', 'incoming'] 
[DEBUG] Launching process ['git', 'checkout', 'incoming'] 
[DEBUG] Process 60530 started 
[DEBUG] Waiting for process 60530 to complete 
[DEBUG] Process 60530 exited with return code 0 
[Level 11] stderr| Already on 'incoming' 
[DEBUG] Discovering plugins 
[DEBUG] Building doc for <class 'datalad.core.local.status.Status'> 
[DEBUG] Building doc for <class 'datalad.core.local.save.Save'> 
[DEBUG] Building doc for <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG] Building doc for <class 'datalad.core.distributed.clone.Clone'> 
[DEBUG] Building doc for <class 'datalad.distribution.get.Get'> 
[DEBUG] Building doc for <class 'datalad.distribution.install.Install'> 
[DEBUG] Building doc for <class 'datalad.interface.annotate_paths.AnnotatePaths'> 
[DEBUG] Building doc for <class 'datalad.distribution.drop.Drop'> 
[DEBUG] Building doc for <class 'datalad.distribution.uninstall.Uninstall'> 
[DEBUG] Building doc for <class 'datalad.distribution.remove.Remove'> 
[DEBUG] Building doc for <class 'datalad.interface.unlock.Unlock'> 
[DEBUG] Building doc for <class 'datalad.core.local.run.Run'> 
[DEBUG] Found matching interface ('datalad.core.local.run', 'Run', 'run') for run 
[DEBUG] Determined class of decorated function: <class 'datalad.core.local.run.Run'> 
[DEBUG] Resolved dataset for tracking outcomes of a command: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] tracking command output underneath Dataset(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459) 
[DEBUG] chdir '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' -> '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459'  
[DEBUG] Determined class of decorated function: <class 'datalad.interface.unlock.Unlock'> 
[DEBUG] Resolved dataset for unlocking: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] Resolved dataset for path resolution: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] Determined class of decorated function: <class 'datalad.core.local.status.Status'> 
[DEBUG] Resolved dataset for status reporting: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] Resolved dataset for path resolution: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[ERROR] dataset containing given paths is not underneath the reference dataset Dataset(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459): [PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459')] [status(/tsd/p23/home/p23-fredrmag/no-backup/test-2)] 
[DEBUG] chdir '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' -> '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' (coming back) 
[INFO] == Command start (output follows) ===== 
[DEBUG] Async run ['/usr/bin/python3', '--version'] 
[DEBUG] Launching process ['/usr/bin/python3', '--version'] 
[DEBUG] Process 60542 started 
[DEBUG] Waiting for process 60542 to complete 
[DEBUG] Process 60542 exited with return code 0 
[DEBUG] Async run ukbfetch -v -a/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key -b.ukbbatch -o.git/tmp/ukb 
[DEBUG] Launching process ukbfetch -v -a/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key -b.ukbbatch -o.git/tmp/ukb 
[DEBUG] Process 60544 started 
[DEBUG] Waiting for process 60544 to complete 
sending incremental file list
1010459_20252_2_0.zip

         32.77K   0%    0.00kB/s    0:00:00  
         52.10M 100%  154.70MB/s    0:00:00 (xfr#1, to-chk=0/1)

sent 52.12M bytes  received 35 bytes  34.74M bytes/sec
total size is 52.10M  speedup is 1.00
[DEBUG] Process 60544 exited with return code 0 
[INFO] == Command exit (modification check follows) ===== 
[DEBUG] chdir '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' -> '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459'  
[DEBUG] Determined class of decorated function: <class 'datalad.core.local.save.Save'> 
[DEBUG] Resolved dataset for saving: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] Determined class of decorated function: <class 'datalad.core.local.status.Status'> 
[DEBUG] Resolved dataset for status reporting: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] Resolved dataset for path resolution: /tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459 
[DEBUG] query AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459).diffstatus() for paths: [] 
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Process 60554 started 
[DEBUG] Waiting for process 60554 to complete 
[DEBUG] Process 60554 exited with return code 0 
[DEBUG] AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459).get_content_info(...) 
[DEBUG] Query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o'] 
[DEBUG] Async run ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o'] 
[DEBUG] Launching process ['git', 'ls-files', '--stage', '-z', '--exclude-standard', '-o'] 
[DEBUG] Process 60556 started 
[DEBUG] Waiting for process 60556 to complete 
[DEBUG] Process 60556 exited with return code 0 
[DEBUG] Done query repo: ['ls-files', '--stage', '-z', '--exclude-standard', '-o'] 
[DEBUG] Done AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459).get_content_info(...) 
[DEBUG] Async run ['git', 'ls-files', '-z', '-m'] 
[DEBUG] Launching process ['git', 'ls-files', '-z', '-m'] 
[DEBUG] Process 60558 started 
[DEBUG] Waiting for process 60558 to complete 
[DEBUG] Process 60558 exited with return code 0 
[DEBUG] AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459).get_content_info(...) 
[DEBUG] Query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] 
[DEBUG] Async run ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] 
[DEBUG] Launching process ['git', 'ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] 
[DEBUG] Process 60560 started 
[DEBUG] Waiting for process 60560 to complete 
[DEBUG] Process 60560 exited with return code 0 
[DEBUG] Done query repo: ['ls-tree', 'HEAD', '-z', '-r', '--full-tree', '-l'] 
[DEBUG] Done AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459).get_content_info(...) 
[DEBUG] Determined 1 datasets for saving from input arguments 
[DEBUG] Initiating ThreadPoolExecutor with 1 jobs 
[DEBUG] Submitting worker future for ('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', {PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459/.ukbbatch'): {'type': 'file', 'gitshasum': 'fc533c62e7bb1dcc9e87b530feb2a4a4b5e5458f', 'bytesize': 18, 'prev_gitshasum': 'fc533c62e7bb1dcc9e87b530feb2a4a4b5e5458f', 'state': 'clean'}, PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459/1010459_20252_2_0.zip'): {'type': 'symlink', 'gitshasum': '1678bb03bf46b7f40f96871ff225c40c5ab30bd6', 'prev_gitshasum': '1678bb03bf46b7f40f96871ff225c40c5ab30bd6', 'state': 'modified'}}) 
[DEBUG] Got consumer worker which returned a generator <generator object Save.__call__.<locals>.save_ds at 0x7fdcb6fd8cf0> 
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Process 60564 started 
[DEBUG] Waiting for process 60564 to complete 
[DEBUG] Process 60564 exited with return code 0 
[DEBUG] Async run ['git', 'diff', '--name-only', '--staged'] 
[DEBUG] Launching process ['git', 'diff', '--name-only', '--staged'] 
[DEBUG] Process 60566 started 
[DEBUG] Waiting for process 60566 to complete 
[DEBUG] Process 60566 exited with return code 0 
[DEBUG] 1 path(s) to add to AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459) {'1010459_20252_2_0.zip': {'type': 'symlink', 'gitshasum': '1678bb03bf46b7f40f96871ff225c40c5ab30bd6', 'prev_gitshasum': '1678bb03bf46b7f40f96871ff225c40c5ab30bd6', 'state': 'modified'}} 
[DEBUG] Async run ['git', 'annex', 'version', '--raw'] 
[DEBUG] Launching process ['git', 'annex', 'version', '--raw'] 
[DEBUG] Process 60568 started 
[DEBUG] Waiting for process 60568 to complete 
[DEBUG] Process 60568 exited with return code 0 
[DEBUG] Async run ['git', 'annex', 'add', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3', '--', '1010459_20252_2_0.zip'] 
[DEBUG] Launching process ['git', 'annex', 'add', '--json', '--json-error-messages', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3', '--', '1010459_20252_2_0.zip'] 
[DEBUG] Process 60576 started 
[DEBUG] Waiting for process 60576 to complete 
[DEBUG] Process 60576 exited with return code 0 
[DEBUG] Adding {'action': 'add', 'path': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459/1010459_20252_2_0.zip', 'type': 'file', 'logger': <Logger datalad.annex (DEBUG)>, 'refds': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', 'status': 'ok', 'message': '', 'key': 'MD5E-s52103140--3038d22e7f3af987b2dd1e3430b1c827.zip'} to queue 
[DEBUG] Committing via direct call of git: ['git', 'commit', '-m', '[DATALAD RUNCMD] Update from UKBiobank\n\n=== Do not change lines below ===\n{\n "chain": [],\n "cmd": "ukbfetch -v -a/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key -b.ukbbatch -o.git/tmp/ukb",\n "exit": 0,\n "extra_inputs": [],\n "inputs": [],\n "outputs": [\n  "."\n ],\n "pwd": "."\n}\n^^^ Do not change lines above ^^^\n'] 
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Process 60601 started 
[DEBUG] Waiting for process 60601 to complete 
[DEBUG] Process 60601 exited with return code 0 
[DEBUG] Async run ['git', 'commit', '-m', '[DATALAD RUNCMD] Update from UKBiobank\n\n=== Do not change lines below ===\n{\n "chain": [],\n "cmd": "ukbfetch -v -a/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key -b.ukbbatch -o.git/tmp/ukb",\n "exit": 0,\n "extra_inputs": [],\n "inputs": [],\n "outputs": [\n  "."\n ],\n "pwd": "."\n}\n^^^ Do not change lines above ^^^\n', '--'] 
[DEBUG] Launching process ['git', 'commit', '-m', '[DATALAD RUNCMD] Update from UKBiobank\n\n=== Do not change lines below ===\n{\n "chain": [],\n "cmd": "ukbfetch -v -a/cluster/projects/p23/data/open_datasets/ukb/scripts/zip2bids/extra/ukb_key -b.ukbbatch -o.git/tmp/ukb",\n "exit": 0,\n "extra_inputs": [],\n "inputs": [],\n "outputs": [\n  "."\n ],\n "pwd": "."\n}\n^^^ Do not change lines above ^^^\n', '--'] 
[DEBUG] Process 60603 started 
[DEBUG] Waiting for process 60603 to complete 
[DEBUG] Got {'action': 'add', 'path': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459/1010459_20252_2_0.zip', 'type': 'file', 'logger': <Logger datalad.annex (DEBUG)>, 'refds': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', 'status': 'ok', 'message': '', 'key': 'MD5E-s52103140--3038d22e7f3af987b2dd1e3430b1c827.zip'} from consumer_queue 
[DEBUG] Process 60603 exited with return code 1 
[DEBUG] nothing to commit in AnnexRepo(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459). Ignored. 
[DEBUG] Async run ['git', 'symbolic-ref', 'HEAD'] 
[DEBUG] Launching process ['git', 'symbolic-ref', 'HEAD'] 
[DEBUG] Process 60618 started 
[DEBUG] Waiting for process 60618 to complete 
[DEBUG] Process 60618 exited with return code 0 
[DEBUG] No sync necessary, no corresponding branch detected 
[DEBUG] Async run ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Launching process ['git', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG] Process 60620 started 
[DEBUG] Waiting for process 60620 to complete 
[DEBUG] Process 60620 exited with return code 0 
[DEBUG] Adding {'action': 'save', 'type': 'dataset', 'path': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', 'refds': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', 'status': 'notneeded', 'logger': <Logger datalad.core.local.save (DEBUG)>} to queue 
[DEBUG] Got {'action': 'save', 'type': 'dataset', 'path': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', 'refds': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459', 'status': 'notneeded', 'logger': <Logger datalad.core.local.save (DEBUG)>} from consumer_queue 
[DEBUG] Future for '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' is done 
[DEBUG] Shutting down <concurrent.futures.thread.ThreadPoolExecutor object at 0x7fdcb6cc0220> with 0 futures. Reason: None 
[DEBUG] Finished shutdown with force=True due to exception=None 
[DEBUG] Shutting down None with 0 futures. Reason: None 
[DEBUG] Finished shutdown with force=True due to exception=None 
[DEBUG] chdir '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' -> '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459' (coming back) 
[DEBUG] could not perform all requested actions: Command did not complete successfully. 1 failed:
[{'action': 'status',
  'message': ('dataset containing given paths is not underneath the reference '
              'dataset %s: %s',
              Dataset('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459'),
              [PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459')]),
  'path': '/tsd/p23/home/p23-fredrmag/no-backup/test-2',
  'refds': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459',
  'status': 'error'}] [utils.py:generator_func:455] 

Is there anything more I could provide?

mih commented 3 years ago

Thx. If you run it with datalad --dbg .... it would show the exact traceback and pointpoint the failure.

fredrmag commented 3 years ago

Ah, cool. Here is the results with the --dbg flag:

[ERROR] dataset containing given paths is not underneath the reference dataset Dataset(/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459): [PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459')] [status(/tsd/p23/home/p23-fredrmag/no-backup/test-2)] 
[INFO] == Command start (output follows) ===== 
sending incremental file list
1010459_20252_2_0.zip

         32.77K   0%    0.00kB/s    0:00:00  
         52.10M 100%  132.07MB/s    0:00:00 (xfr#1, to-chk=0/1)

sent 52.12M bytes  received 35 bytes  104.23M bytes/sec
total size is 52.10M  speedup is 1.00
[INFO] == Command exit (modification check follows) ===== 
Traceback (most recent call last):
  File "/usr/local/bin/datalad", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/datalad/cmdline/main.py", line 498, in main
    ret = cmdlineargs.func(cmdlineargs)
  File "/usr/local/lib/python3.8/dist-packages/datalad/interface/base.py", line 786, in call_from_parser
    ret = list(ret)
  File "/usr/local/lib/python3.8/dist-packages/datalad/interface/utils.py", line 389, in generator_func
    for r in _process_results(
  File "/usr/local/lib/python3.8/dist-packages/datalad/interface/utils.py", line 557, in _process_results
    for res in results:
  File "/usr/local/lib/python3.8/dist-packages/datalad_ukbiobank/update.py", line 168, in __call__
    ds.run(
  File "/usr/local/lib/python3.8/dist-packages/wrapt/wrappers.py", line 605, in __call__
    return self._self_wrapper(self.__wrapped__, self._self_instance,
  File "/usr/local/lib/python3.8/dist-packages/datalad/distribution/dataset.py", line 503, in apply_func
    return f(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/wrapt/wrappers.py", line 566, in __call__
    return self._self_wrapper(self.__wrapped__, self._self_instance,
  File "/usr/local/lib/python3.8/dist-packages/datalad/interface/utils.py", line 482, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/wrapt/wrappers.py", line 566, in __call__
    return self._self_wrapper(self.__wrapped__, self._self_instance,
  File "/usr/local/lib/python3.8/dist-packages/datalad/interface/utils.py", line 470, in return_func
    results = list(results)
  File "/usr/local/lib/python3.8/dist-packages/datalad/interface/utils.py", line 455, in generator_func
    raise IncompleteResultsError(
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
[{'action': 'status',
  'message': ('dataset containing given paths is not underneath the reference '
              'dataset %s: %s',
              Dataset('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459'),
              [PosixPath('/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459')]),
  'path': '/tsd/p23/home/p23-fredrmag/no-backup/test-2',
  'refds': '/tsd/p23/home/p23-fredrmag/no-backup/test-2/data/1010459',
  'status': 'error'}]
bpoldrack commented 3 years ago

Reproduced. Looking into it.

Dumping some info from digging into it:

Issue emerges from run -> unlock -> status. Within status, we check for whether or not we are in a subdataset. This check yields the wrong result here: https://github.com/datalad/datalad/blob/master/datalad/core/local/status.py#L348

We explicitly check for orig_path == '.' (which it should be as per the call in run). But really it isn't since unlock resolved it to an absolute path before passing it on to status. As a result we consider the git repository upstairs as a superdataset and that's just wrong here.

So - to be solved in datalad core, not in datalad-ukbiobank as far as I can currently tell.

bpoldrack commented 3 years ago

Fixed in PR https://github.com/datalad/datalad/pull/5458. Will be part of next minor release of datalad.

loj commented 3 years ago

This error has shown up again for me, but is now failing in a different spot.

Following the same example as above...

mkdir test && cd test && git init && mkdir data && cd data
datalad create 0001234 && cd 0001234
datalad ukb-init -f --bids 0001234 20252_2_0
datalad -c datalad.ukbiobank.keyfile=keyfile ukb-update

now results in

❱ datalad -c datalad.ukbiobank.keyfile=keyfile ukb-update 
[INFO   ] == Command start (output follows) ===== 
sending incremental file list
0001234_20252_2_0.zip
         47.98M 100%    1.40GB/s    0:00:00 (xfr#1, to-chk=0/1)

sent 47.99M bytes  received 35 bytes  95.98M bytes/sec
total size is 47.98M  speedup is 1.00
[INFO   ] == Command exit (modification check follows) ===== 
[INFO   ] Adding content of the archive MD5E-s47978541--6ea3dfeca7895e37c470d02a06c0853f.zip into annex AnnexRepo(/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234)                              
[INFO   ] Initiating special remote datalad-archives 
[INFO   ] Finished adding MD5E-s47978541--6ea3dfeca7895e37c470d02a06c0853f.zip: Files processed: 45, renamed: 45, +annex: 45 
ukb_update(ok): . (dataset)                                                                              
[ERROR  ] dataset containing given paths is not underneath the reference dataset Dataset(/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234): [PosixPath('/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234')] [status(/data/project/rehab_biobank/dl_update/ingestion/test)] 
[ERROR  ] '/data/project/rehab_biobank/dl_update/ingestion/test' does not start with '/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234' [pathlib.py:relative_to:876] (ValueError) 
with --dbg

``` ❱ datalad --dbg -c datalad.ukbiobank.keyfile=keyfile ukb-update [INFO ] == Command start (output follows) ===== sending incremental file list 0001234_20252_2_0.zip 47.98M 100% 1.31GB/s 0:00:00 (xfr#1, to-chk=0/1) sent 47.99M bytes received 35 bytes 95.98M bytes/sec total size is 47.98M speedup is 1.00 [INFO ] == Command exit (modification check follows) ===== [INFO ] Adding content of the archive MD5E-s47978541--6ea3dfeca7895e37c470d02a06c0853f.zip into annex AnnexRepo(/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234) [INFO ] Initiating special remote datalad-archives [INFO ] Finished adding MD5E-s47978541--6ea3dfeca7895e37c470d02a06c0853f.zip: Files processed: 45, renamed: 45, +annex: 45 ukb_update(ok): . (dataset) [ERROR ] dataset containing given paths is not underneath the reference dataset Dataset(/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234): [PosixPath('/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234')] [status(/data/project/rehab_biobank/dl_update/ingestion/test)] Traceback (most recent call last): File "/home/loj/.venvs/ukbiobank/bin/datalad", line 33, in sys.exit(load_entry_point('datalad', 'console_scripts', 'datalad')()) File "/home/loj/datalad/datalad/cmdline/main.py", line 440, in main ret = cmdlineargs.func(cmdlineargs) File "/home/loj/datalad/datalad/interface/base.py", line 783, in call_from_parser ret = list(ret) File "/home/loj/datalad/datalad/interface/utils.py", line 405, in generator_func allkwargs): File "/home/loj/datalad/datalad/interface/utils.py", line 563, in _process_results for res in results: File "/home/loj/.venvs/ukbiobank/lib/python3.7/site-packages/datalad_ukbiobank/update.py", line 285, in __call__ base_path=repo.pathobj, File "/home/loj/.venvs/ukbiobank/lib/python3.7/site-packages/datalad_ukbiobank/ukb2bids.py", line 52, in restructure_ukb2bids rp_parts = list(Path(fp['path']).relative_to(base_path or ds.pathobj).parts) File "/usr/lib/python3.7/pathlib.py", line 876, in relative_to .format(str(self), str(formatted))) ValueError: '/data/project/rehab_biobank/dl_update/ingestion/test' does not start with '/data/project/rehab_biobank/dl_update/ingestion/test/data/0001234' > /usr/lib/python3.7/pathlib.py(876)relative_to() -> .format(str(self), str(formatted))) ```

ping @bpoldrack

mih commented 3 years ago

I am closing this one here. It was found to be a general datalad issue ^^^