ReproNim / reproman

ReproMan (AKA NICEMAN, AKA ReproNim TRD3)
https://reproman.readthedocs.io
Other
24 stars 14 forks source link

reproman's datalad-pair-run run record should probably store some "reproman run" construct? #461

Open yarikoptic opened 4 years ago

yarikoptic commented 4 years ago

ATM datalad run commit record, in the cmd field just records the job id, e.g. "cmd": "20190920-124832-7bf3",

a full example ```shell (git-annex)hopa:…im/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3[master]git-annex $> reproman run --follow --input 'data/bids/sub-{p[sub]}' -r localshell --sub condor --orc datalad-pair-run --bp "sub=02,13" bash -c 'mkdir -p out; du -scb {inputs} > out/du-sub-{p[sub]}' 2019-09-20 12:48:33,658 [INFO ] No root directory supplied for localshell; using '/home/yoh/.reproman/run-root' 2019-09-20 12:48:34,327 [INFO ] Submitting 20190920-124832-7bf3 2019-09-20 12:48:34,362 [INFO ] Submitting /home/yoh/.reproman/run-root/3d36be08-da23-11e9-85fc-8019340ce7f2/.reproman/jobs/localshell/20190920-124832-7bf3/submit 2019-09-20 12:48:34,417 [INFO ] Job 20190920-124832-7bf3 submitted as condor job 20 2019-09-20 12:48:34,426 [INFO ] Registered job 20190920-124832-7bf3 2019-09-20 12:48:34,453 [INFO ] Waiting on job 20: running 2019-09-20 12:48:44,527 [INFO ] Fetching results for 20190920-124832-7bf3 2019-09-20 12:48:44,622 [INFO ] Creating run commit in /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3 2019-09-20 12:48:46,446 [INFO ] Unregistered job 20190920-124832-7bf3 (dev3) 1 28852.....................................:Fri 20 Sep 2019 12:48:46 PM EDT:. (git-annex)hopa:…im/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3[master]git-annex $> git show --stat commit b91e86dabf9983c6829d4c5fa3ba3b4a126d6148 (HEAD -> master, refs/reproman/20190920-124832-7bf3) Author: Yaroslav Halchenko Date: Fri Sep 20 12:48:46 2019 -0400 [DATALAD RUNCMD] 20190920-124832-7bf3 === Do not change lines below === { "chain": [], "cmd": "20190920-124832-7bf3", "dsid": "3d36be08-da23-11e9-85fc-8019340ce7f2", "exit": 0, "extra_inputs": [], "inputs": [ "data/bids/sub-{p[sub]}" ], "outputs": [], "pwd": ".", "reproman_jobid": "20190920-124832-7bf3" } ^^^ Do not change lines above ^^^ .reproman/jobs/localshell/20190920-124832-7bf3/command-array | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/idmap | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/pre-finished.0 | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/pre-finished.1 | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/runscript | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ .reproman/jobs/localshell/20190920-124832-7bf3/status.0 | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/status.1 | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/stderr.0 | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/stderr.1 | 1 + .reproman/jobs/localshell/20190920-124832-7bf3/stdout.0 | 3 +++ .reproman/jobs/localshell/20190920-124832-7bf3/stdout.1 | 4 ++++ .reproman/jobs/localshell/20190920-124832-7bf3/submit | 15 ++++++++++++++ .reproman/jobs/localshell/20190920-124832-7bf3/togethome | 17 ++++++++++++++++ out/du-sub-02 | 2 ++ out/du-sub-13 | 2 ++ ```

Should it store the command to run there instead, i.e. sh .reproman/jobs/localshell/<JOBID>/command-array?

Additional issue detected: in my case above command-array script seems to be missing a new line to separate separate entries:

$> nl .reproman/jobs/localshell/20190920-124832-7bf3/command-array
     1  bash -c 'mkdir -p out; du -scb data/bids/sub-02 > out/du-sub-02'bash -c 'mkdir -p out; du -scb data/bids/sub-13 > out/du-sub-13'

actually there is a 0x00 there as a separator, but should be a new line.

After adjusting the cmd entry and fixing up that command array, I managed to datalad rerun it! whoohoo

$> datalad rerun                                                   
[INFO   ] Making sure inputs are available (this may take some time) 
[WARNING] Input does not exist: /home/yoh/proj/repronim/reproman-master/docs/usecases/bids-fmriprep-workflow-NP/out3/data/bids/sub-{p[sub]} 
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 
action summary:                                                                                                                                                              
  get (notneeded: 1)
  save (notneeded: 5)
  unlock (notneeded: 11)

so one point is that pure datalad of cause had no clue on how to treat job parameters in the inputs, so it is not entirely rerunnable and we should think more on how to possibly make it so.

kyleam commented 4 years ago

Additional issue detected: in my case above command-array script seems to be missing a new line to separate separate entries: [...] actually there is a 0x00 there as a separator, but should be a new line.

Yes, the commands are separated by NULs. Why should there be a new line?

so one point is that pure datalad of cause had no clue on how to treat job parameters in the inputs, so it is not entirely rerunnable and we should think more on how to possibly make it so.

This is an outstanding issue that needs to be dealt with. Quoting from #458:

reproman run records for concurrent jobs are not compatible with datalad rerun. See the run record bullet point in de60efa (NF: orchestrators: Support concurrent jobs, 2019-05-16) and

https://github.com/ReproNim/reproman/blob/7c8800e3fdedf0471584f1040f2e35025f33fe2d/reproman/support/jobs/orchestrators.py#L978-L989