PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 235 forks source link

`runModule_start_model_runs()` calls `remote.copy.from()` immediately if job id is `NULL` #2958

Open Aariq opened 2 years ago

Aariq commented 2 years ago

Bug Description

runModule_start_model_runs() should, I think, copy the run directory over to the HPC, then launch jobs on the HPC, then wait for them to be done (with qsub_run_finished()), then copy the out directory back (with remote.copy.from()). This isn't working for me because the jobid is NULL. Here's a section of the output from runModule_start_model_runs():

2022-07-12 21:42:26 DEBUG  [remote.execute.cmd] : 
   ssh -T -l ericrscott puma 'squeue --job NULL &> /dev/null || echo DONE' 
2022-07-12 21:42:27 DEBUG  [PEcAn.remote::qsub_run_finished] : 
   Job NULL for run NULL finished 
2022-07-12 21:42:27 DEBUG  [PEcAn.remote::remote.copy.from] : 
   rsync '-az' '-q' 
   'ericrscott@puma:/groups/dlebauer/ed2_results/pecan_remote/2022-07-12-21-39-43/out/ENS-00005-678' 
   '/home/ericrscott/Eric-ED2/WLEF/outputs/out' 

I'm not entirely sure what the fix is.

To Reproduce

I think I'd need some guidance on how to reproduce this.

Expected behavior

I'd expect the R session to wait until the HPC runs were finished and results were copied back over. Additionally, it should fail fast in the case the the job ID is not valid (e.g. if it's NULL).

Machine (please complete the following information):

I think @dlebauer and @KristinaRiemer are aware of this bug as well, but couldn't find an open issue.

Aariq commented 2 years ago

Ok, so I think the problem on my end is an incorrect <qsub.jobid>, but I still think this should fail fast if the job ID is NULL

Aariq commented 2 years ago

So to clarify, here's what I think should happen if the job ID is NULL for any reason (e.g. an incorrect pattern in ): 1) print a message like "Job ID is NULL. Jobs are running but won't be automatically retrieved from host. Check hopst$qsub.jobid in settings" 2)remote.copy.from()` does NOT get called and the function exits.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 365 days with no activity.