Mouse-Imaging-Centre / pydpiper

Python code for flexible pipeline control
Other
25 stars 10 forks source link

MBM.py shuts down for no discernible reason #421

Closed gdevenyi closed 5 years ago

gdevenyi commented 5 years ago

For both SGE and local execution mode, I have a pipeline that is nearly complete (a few more nlin-2 registrations left) that will fire up, start jobs, run the jobs, and then MBM.py will shut down.

For the SGE case, the pyro connection fails and the jobs die, for the local case, MBM.py shuts down but the executors keep running.

I can find no signs of errors or failure in the logs files, simply:

[2019-03-25 13:54:45.253,pydpiper.execution.pipeline,INFO] Starting Stage 10507: ANTS 3 --number-of-affine-iterations 0 -m 'CC[embryo_mia_24122018_processed/img_29nov18.12.sept2014_dist_corr_preproc/resampled/img_29nov18.12.sept2014_dist_corr_preproc_I_lsq6_avg_lsq12-resampled.mnc,embryo_mia_24122018_nlin/embryo_mia_24122018-nlin-2.mnc,1.0,3]' -m 'CC[embryo_mia_24122018_processed/img_29nov18.12.sept2014_dist_corr_preproc/tmp/img_29nov18.12.sept2014_dist_corr_preproc_I_lsq6_avg_lsq12-resampled_fwhm0.027_dxyz.mnc,embryo_mia_24122018_nlin/embryo_mia_24122018-nlin-2/tmp/embryo_mia_24122018-nlin-2_fwhm0.027_dxyz.mnc,1.0,3]' -t SyN[0.2] -r Gauss[2,1] -i 100x100x100x150 -o embryo_mia_24122018_processed/img_29nov18.12.sept2014_dist_corr_preproc/transforms/img_29nov18.12.sept2014_dist_corr_preproc_I_lsq6_avg_lsq12-resampled_ANTS_to_embryo_mia_24122018-nlin-2.xfm(PYRO:obj_bb6fce0fa4024078a776b2d22c4dbab4@172.16.67.225:35240)
[2019-03-25 13:54:49.018,pydpiper.execution.pipeline,INFO] Server loop going to shut down .
bcdarwin commented 5 years ago

Hard to tell what's going on here (aside: looks like I'll shortly have access to an SGE cluster again). Can you run at PYRO_LOGLEVEL=DEBUG and see if you get any better messages? Do you see any error messages in the logfile for the stages that start to run? Thanks!

bcdarwin commented 5 years ago

Also, can you list the execution-related flags you are using?

gdevenyi commented 5 years ago

I'm going to close this because this problem was in a hacky version I made to try and address https://github.com/Mouse-Imaging-Centre/pydpiper/issues/416

I've changed how I've address it by just hard-coding a different qbatch call than would be typically generated.

Thanks for the suggestion re: pyro, I couldn't figure out how to make the right kind of logging, I'll try and remember for next time.