desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

pipeline overwriting jobs and tasks when jobs are split? #716

Open julienguy opened 5 years ago

julienguy commented 5 years ago

I think the scheme where we are writing jobs and list of tasks with directories and file names composed of task type and date (to seconds) is problematic when the code writes a series of jobs to fit in the debug queue. I suspect several jobs end up having the same name. Ex:

desi_pipe tasks --states ready --tasktype redshift | desi_pipe script --nersc cori-haswell --nersc_maxnodes 64 --nersc_queue debug

/global/cscratch1/sd/jguy/redux/redwood-sp0/run/scripts/redshift_20181106-061024/cori-haswell.slurm,/global/cscratch1/sd/jguy/redux/redwood-sp0/run/scripts/redshift_20181106-061024/cori-haswell.slurm 

(twice the same job)

tskisner commented 5 years ago

In the case of a single pipeline step run in one or more scripts, you can see the output file here:

https://github.com/desihub/desispec/blob/master/py/desispec/pipeline/scriptgen.py#L590

If there is only one script, then no suffix is appended. If there are multiple job scripts, then an underscore and the job index are appended. In the command above, it looks like there was only one job script needed to fit into the queue constraints, so no suffix was appended.

julienguy commented 5 years ago

You can see in the example above that the code lists in stdout twice the same script. Also, the list of tasks in the job directory is actually not the complete list, so my suspicion.

tskisner commented 5 years ago

Ah, sorry, did not scroll over to see the comma that there are two files.

Looking at your command line, did you copy / paste that from the terminal? The valid option is --tasktypes:

https://github.com/desihub/desispec/blob/master/py/desispec/scripts/pipe.py#L210

That should have given an error...

julienguy commented 5 years ago

Haven't you noticed there is an autocompletion feature in argparse?

tskisner commented 5 years ago

I verified that this is no longer an issue in the branch of #806 (and possibly in current master as well). Tested with:

desi_pipe tasks --states waiting --tasktype redshift | desi_pipe script --nersc cori-haswell --nersc_maxnodes 64 --nersc_queue regular --nersc_maxtime 100
...
/global/cscratch1/sd/kisner/desi/svdc/spectro/redux/v4/run/scripts/redshift_20190814-102507-064450/cori-haswell.slurm,/global/cscratch1/sd/kisner/desi/svdc/spectro/redux/v4/run/scripts/redshift_20190814-102507-064450/cori-haswell_1.slurm

I will close after #806 is merged.