ReproNim / reproman

ReproMan (AKA NICEMAN, AKA ReproNim TRD3)
https://reproman.readthedocs.io
Other
24 stars 14 forks source link

datalad-pair fails with fatal: 'origin' does not appear to be a git repository #499

Closed yarikoptic closed 4 years ago

yarikoptic commented 4 years ago

on #438 with using datalad-pair (in RM_ORC) instead of datalad-pair-run results in:

...
2020-01-08 12:53:27,341 [INFO   ] Fetching results for 20200108-124152-a708 
2020-01-08 12:53:27,716 [INFO   ] Updating local dataset with changes from 'smaug' 
2020-01-08 12:53:28,151 [ERROR  ] Cmd('/usr/lib/git-annex.linux/git') failed due to: exit code(1)                                                                                                                                  
|   cmdline: /usr/lib/git-annex.linux/git fetch --progress -v smaug refs/reproman/20200108-124152-a708:refs/reproman/20200108-124152-a708
|   stderr: 'fatal: 'origin' does not appear to be a git repository
| fatal: Could not read from remote repository.
| 
| Please make sure you have the correct access rights
| and the repository exists.' [cmd.py:wait:417] (GitCommandError) 
FS_LICENSE=~/.freesurfer-license RUNNER=reproman CONTAINERS_REPO= =    17.62s user 6.82s system 3% cpu 11:51.57 total

my invocation was

$> FS_LICENSE=~/.freesurfer-license RUNNER=reproman CONTAINERS_REPO=~/proj/repronim/containers INPUT_DATASET_REPO=$PWD/bids-fmriprep-workflow-NP/ds000003-demo ./bids-fmriprep-workflow-NP.sh bids-fmriprep-workflow-NP/out7-datalad-pair

NB you have ~/.freesurfer-license on smaug now @kyleam . My reproman is v0.2.1-51-g924980c .

kyleam commented 4 years ago

Given the fetch call and refspec shown in the log message, I'd imagine that failure is coming from this line:

https://github.com/ReproNim/reproman/blob/a74de9afeef51cdc3d5a2f7490ddc923ad1ed5a5/reproman/support/jobs/orchestrators.py#L965

But I don't know how "origin" is coming into the picture because, as shown in the log message and the code snippet above, that's calling fetch with remote=resource_name.

My reproman is v0.2.1-51-g924980c .

I can't find that revision:

$ git fetch origin
$ git fetch yarikoptic
$ git show 924980c
fatal: ambiguous argument '924980c': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
yarikoptic commented 4 years ago

My reproman is v0.2.1-51-g924980c .

I can't find that revision:

eh, sorry -- it was with a merged master. Pushed now, after merging @chaselgrove 's d5e90283e010a24bb29c10d5cf24a454054f62d2 commit which changed defaults for RM_SUB and RM_RESOURCE to local

kyleam commented 4 years ago

As I mentioned yesterday on the call, I'm hitting into another issue when I try to run this. It happens after the first two reproman run calls from the script succeed. I'm working on figuring that out (and have put some details are below the fold, if anyone is interested). But the main thing I want to point out is that, in the out7-datalad-pair.tgz file @yarikoptic gave me yesterday for the "origin" failure above, the log doesn't have any successful runs in master, and it has one run in a refs/reproman/ ref. So it seems like it fails when trying to bring in the first successful run. Given I'm seeing two successful runs (and merged into master as expected), it looks like I'm having no luck triggering this issue's error.

publish failure that I'm seeing With d05430ddc (current tip of yarikoptic/doc-usecases) checked out, I'm running RM_ORC=datalad-pair RM_SUB=condor RM_RESOURCE=smaug FS_LICENSE=~/.freesurfer-license RUNNER=reproman ~/src/python/reproman/docs/usecases/bids-fmriprep-workflow-NP.sh bids-fmriprep-workflow-NP/out7-datalad-pair Here's the last bit of the output showing the `run` call and the `datalad publish` failure within `prepare_remote()`: ``` [...] + reproman_run --jp container=containers/bids-fmriprep --input data/bids --output data/fmriprep --bp pl=02,13 '{inputs}' '{outputs}' participant --participant_label '{p[pl]}' --fs-license-file=containers/licenses/freesurfer -w work + reproman run --follow -r smaug --sub condor --orc datalad-pair-run --jp container=containers/bids-fmriprep --input data/bids --output data/fmriprep --bp pl=02,13 '{inputs}' '{outputs}' participant --participant_label '{p[pl]}' --fs-license-file=containers/licenses/freesurfer -w work 2020-01-10 15:56:19,206 [INFO] No root directory supplied for smaug; using '/home/kyle/.reproman/run-root' [INFO] Publishing to smaug [INFO] Publishing to smaug 2020-01-10 15:56:50,716 [ERROR] 'datalad publish' failed. Try running 'datalad update -s smaug --merge --recursive' first [orchestrators.py:prepare_remote:819] (OrchestratorError) ``` At that point locally, my log looks like this ``` git log --oneline -n4 1a5fa74 (HEAD -> master, smaug/master, synced/master) [DATALAD] Recorded changes 8915274 (refs/reproman/20200110-155451-94bf) [DATALAD RUNCMD] containers/scripts/singularity_cmd run c... 6ce7582 (refs/reproman/20200110-154147-8822) [DATALAD RUNCMD] 20200110-154147-8822 02ca55e [ReproMan] Configure jobs directory ``` master's log from out7-datalad-pair.tgz that I mentioned above stopped at what would correspond to the bottom commit (02ca55e). At that point on the resource, the mriqc submodule is modified because the HEAD commit is one commit back of the commit registered in the parent. ``` $ git diff diff --git a/data/mriqc b/data/mriqc index 2816095..5fcb066 160000 --- a/data/mriqc +++ b/data/mriqc @@ -1 +1 @@ -Subproject commit 281609590bd1874e057a99101b1fa9cae6162841 +Subproject commit 5fcb066953d56cd5d1510d8657d9b7208e405e7d ```
kyleam commented 4 years ago

This made me scratch my head a bit, but I've made some progress. With a smaug ssh resource and master (a74de9afe) checked out, I can trigger the "'fatal: 'origin' does [...]" failure with

cd "$(mktemp -d --tmpdir rman-XXXXXXX)"
datalad create
datalad create -d. subds
reproman run --follow -r smaug --orc datalad-pair sh -c "echo one >subds/one"

The traceback looks like this:

Traceback (most recent call last):
  File "/home/kyle/src/python/reproman/reproman/support/jobs/orchestrators.py", line 966, in fetch
    self.ds.repo.fetch(resource_name, "{0}:{0}".format(ref))
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 412, in wrapped
    return func(repo, *args, **kwargs)
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 2174, in fetch
    **kwargs
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 2200, in _call_gitpy_with_progress
    ret = callable(**git_kwargs)
  File "/home/kyle/src/python/gitpython/git/remote.py", line 792, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/home/kyle/src/python/gitpython/git/remote.py", line 676, in _get_fetch_info_from_stderr
    proc.wait(stderr=stderr_text)
  File "/home/kyle/src/python/gitpython/git/cmd.py", line 417, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
  cmdline: git fetch --progress -v smaug refs/reproman/20200116-132629-0c1e:refs/reproman/20200116-132629-0c1e
  stderr: 'fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.'

Some observations:

All of those point to it being git fetchs recursive fetch of subds that is looking for origin and failing. As expected, we can avoid this issue by calling fetch with --recurse-submodules=no. ~And doing that is fine, because the ref we want to fetch is present only in the top-level dataset.~ But doing that is probably problematic if we needed to switch to a different base on the remote end and the update --recursive call doesn't fetch those commits.

This seems like something worth reporting/improving in Git, but I'd need to look into it a bit more. (Edit: git's submodule.c hardcodes "origin" for the fetch, and its labelled as NEEDSWORK, so it is a known limitation.)