McIntosh-Lab / tvb-ukbb

TVB-UKBB Pipeline: TheVirtualBrain implementation of the UK Biobank pipeline
Other
23 stars 12 forks source link

fsl_sub doesn't run bb_pre_eddy properly #27

Closed shen4brains closed 4 years ago

shen4brains commented 4 years ago

possible permissions issue with fsl_sub?

fsl_sub submission of bb_diffusion_pipeline/bb_eddy/bb_pre_eddy fails to link and copy files into $direc/dMRI/dMRI such that the rest of bb_pre_eddy fails

direct submission of bb_pre_eddy shell script as well as submission using qsub work fine

the call to fsl_sub is generated in bb_eddy/bb_pipeline_diffusion.py and looks like (from the logs):

${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_pre_eddy_sub-002S0413" -j 1479304,-1 -l /liberatrix/mcintosh_lab/kshen/ukbb/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_pre_eddy sub-002S0413

The -j flag to hold submission until job j (structural pipe) finishes does not look right with ,-1 appended to it. Need to trace where this gets appended. But even if I manually correct it to just the structural pipe job, or if I remove the -j entirely it still doesn't work

shen4brains commented 4 years ago

I should add that similar commands to create links and copy files in shell scripts in the structural pipeline run just fine for me with fsl_sub so I'm really at a loss as to what's going on here

noahfl commented 4 years ago

Here's the full logfile (bb_pipeline_diff__sub_002S0413_XXXXX.log) I get when running it:

2020-08-28 10:49:48,554 - bb_pipeline_diff - INFO - Starting the subject processing: Fri Aug 28 10:49:48 2020                                                                                                                    
2020-08-28 10:49:48,554 - bb_pipeline_diff - INFO - Subject received as input: sub-002S0413                                                                                                                                      
2020-08-28 10:49:48,555 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q   -N "bb_pre_eddy_sub-002S0413" -j 1479449  -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_pre_eddy sub-002S0413                                                                                                                                                                      
2020-08-28 10:49:48,694 - bb_pipeline_diff - INFO - COMMAND OUTPUT:     1479456                                                                                                                                                  
2020-08-28 10:49:48,695 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q  -N "bb_eddy_sub-002S0413" -j 1479456  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_eddy_wrap /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413                                                                                                                      
2020-08-28 10:49:48,818 - bb_pipeline_diff - ERROR - Exception raised during execution of:      ${FSLDIR}/bin/fsl_sub -q bigmem_16.q  -N "bb_eddy_sub-002S0413" -j 1479456  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_eddy_wrap /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413                                                                                              
2020-08-28 10:49:48,819 - bb_pipeline_diff - ERROR - Exception type:    <class 'subprocess.CalledProcessError'>                                                                                                                  
2020-08-28 10:49:48,819 - bb_pipeline_diff - ERROR - Exception args:    (127, '${FSLDIR}/bin/fsl_sub -q bigmem_16.q  -N "bb_eddy_sub-002S0413" -j 1479456  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_eddy_wrap /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413')                                                                                                             
2020-08-28 10:49:48,819 - bb_pipeline_diff - ERROR - Exception message:         Command '${FSLDIR}/bin/fsl_sub -q bigmem_16.q  -N "bb_eddy_sub-002S0413" -j 1479456  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_eddy_wrap /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413' returned non-zero exit status 127.                                                                 
2020-08-28 10:49:48,819 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q  -N "bb_post_eddy_sub-002S0413" -j   -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_eddy/bb_post_eddy /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413                                                                                                                                       
2020-08-28 10:49:48,954 - bb_pipeline_diff - INFO - COMMAND OUTPUT:     1479457                                                                                                                                                  
2020-08-28 10:49:48,955 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q   -N "bb_dtifit_sub-002S0413" -j 1479457  -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ ${FSLDIR}/bin/dtifit -k /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI/dMRI/data_1_shell -m /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI/dMRI/nodif_brain_mask_ud -r /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI/dMRI/data_1_shell.bvec -b /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI/dMRI/data_1_shell.bval -o /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI/dMRI/dti                                       
2020-08-28 10:49:49,087 - bb_pipeline_diff - INFO - COMMAND OUTPUT:     1479458                                                                                                                                                  
2020-08-28 10:49:49,088 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_tbss_sub-002S0413" -j 1479458  -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_tbss/bb_tbss_general sub-002S0413                                                                                                                                                                           
2020-08-28 10:49:49,231 - bb_pipeline_diff - INFO - COMMAND OUTPUT:     1479459                                                                                                                                                  
2020-08-28 10:49:49,232 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q   -N "bb_pre_bedpostx_gpu_sub-002S0413" -j 1479458  -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_bedpostx/bb_pre_bedpostx_gpu /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI                                                                                                        
2020-08-28 10:49:49,363 - bb_pipeline_diff - INFO - COMMAND OUTPUT:     1479460                                                                                                                                                  
2020-08-28 10:49:49,364 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_bedpostx_gpu_sub-002S0413" -j 1479460  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_bedpostx/bb_bedpostx_gpu /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI                                                                                                   
2020-08-28 10:49:49,483 - bb_pipeline_diff - ERROR - Exception raised during execution of:      ${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_bedpostx_gpu_sub-002S0413" -j 1479460  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_bedpostx/bb_bedpostx_gpu /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI                                                                           
2020-08-28 10:49:49,484 - bb_pipeline_diff - ERROR - Exception type:    <class 'subprocess.CalledProcessError'>                                                                                                                  
2020-08-28 10:49:49,484 - bb_pipeline_diff - ERROR - Exception args:    (127, '${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_bedpostx_gpu_sub-002S0413" -j 1479460  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_bedpostx/bb_bedpostx_gpu /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI')                                                                                          
2020-08-28 10:49:49,484 - bb_pipeline_diff - ERROR - Exception message:         Command '${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_bedpostx_gpu_sub-002S0413" -j 1479460  -q $FSLGECUDAQ -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_bedpostx/bb_bedpostx_gpu /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI' returned non-zero exit status 127.                                              
2020-08-28 10:49:49,484 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_pre_probtrackx_sub-002S0413" -j  -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_probtrackx2/bb_pre_probtrackx2 /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413                                                                                                                       
2020-08-28 10:49:49,603 - bb_pipeline_diff - INFO - COMMAND OUTPUT:     1479461                                                                                                                                                  
2020-08-28 10:49:49,604 - bb_pipeline_diff - INFO - COMMAND TO RUN:     ${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_probtrackx_sub-002S0413" -j 1479461 -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_probtrackx2/bb_probtrackx2 /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI                                                                                                                   
2020-08-28 10:49:49,692 - bb_pipeline_diff - ERROR - Exception raised during execution of:      ${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_probtrackx_sub-002S0413" -j 1479461 -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_probtrackx2/bb_probtrackx2 /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI                                                                                           
2020-08-28 10:49:49,693 - bb_pipeline_diff - ERROR - Exception type:    <class 'subprocess.CalledProcessError'>                                                                                                                  
2020-08-28 10:49:49,694 - bb_pipeline_diff - ERROR - Exception args:    (255, '${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_probtrackx_sub-002S0413" -j 1479461 -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_probtrackx2/bb_probtrackx2 /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI')                                                                                                          
2020-08-28 10:49:49,694 - bb_pipeline_diff - ERROR - Exception message:         Command '${FSLDIR}/bin/fsl_sub -q bigmem_16.q -N "bb_probtrackx_sub-002S0413" -j 1479461 -l /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/logs/ $BB_BIN_DIR/bb_diffusion_pipeline/bb_probtrackx2/bb_probtrackx2 /liberatrix/mcintosh_lab/nfrazier-logue/sub-002S0413/dMRI' returned non-zero exit status 255.     

It looks like the first line, the one you mentioned, runs successfully and the lines that throw errors are lines that contain the variable $FSLGECUDAQ, which is set in init_vars as:

export FSLGECUDAQ=cuda.q

shen4brains commented 4 years ago

This $FSLGECUDAQ is also flagged in issue #9. Since we already call bigmem_16.q or all.q with the -q flag, we don't need $FSLGECUDAQ in the fsl_sub command. Will push all relevant changes in my next commit