FCP-INDI / C-PAC

Configurable Pipeline for the Analysis of Connectomes
https://fcp-indi.github.io/
GNU Lesser General Public License v3.0
64 stars 41 forks source link

🐛 ANTs doesn't respect memory constraints #1404

Open shnizzedy opened 3 years ago

shnizzedy commented 3 years ago

Describe the bug

memory limit specified memory used by ANTs
16 Gb 23 Gb
16 Gb > 18 Gb
40 Gb 45 Gb

This is particularly a problem on clusters that have hard memory limits.

Here's an example command.txt that uses too much memory:

antsRegistration --collapse-output-transforms 1 --dimensionality 3 --initial-moving-transform [/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,0] --transform Rigid[0.05] --metric MI[/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,1,32,Regular,0.25] --convergence [100x100,1e-06,20] --smoothing-sigmas 2.0x1.0vox --shrink-factors 2x1 --use-histogram-matching 1 --transform Affine[0.08] --metric MI[/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,1,32,Regular,0.25] --convergence [100x100,1e-06,20] --smoothing-sigmas 1.0x0.0vox --shrink-factors 2x1 --use-histogram-matching 1 --transform SyN[0.1,3.0,0.0] --metric CC[/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,1,4] --convergence [100x70x50x20,1e-06,10] --smoothing-sigmas 3.0x2.0x1.0x0.0vox --shrink-factors 8x4x2x1 --use-histogram-matching 1 --winsorize-image-intensities [0.005,0.995] --interpolation LanczosWindowedSinc --output [transform,transform_Warped.nii.gz]

(same thing, wrapped for ease of reading):

antsRegistration \                                                                                                        
  --collapse-output-transforms 1 \                                                                                        
  --dimensionality 3 \                                                                                                    
  --initial-moving-transform [/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,0] \                                                                                                                       
  --transform Rigid[0.05] \                                                                                               
  --metric MI[/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,1,32,Regular,0.25] \                                                                                                                       
  --convergence [100x100,1e-06,20] \                                                                                      
  --smoothing-sigmas 2.0x1.0vox \                                                                                         
  --shrink-factors 2x1 \                                                                                                  
  --use-histogram-matching 1 \                                                                                            
  --transform Affine[0.08] \                                                                                              
  --metric MI[/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,1,32,Regular,0.25] \                                                                                                                       
  --convergence [100x100,1e-06,20] \                                                                                      
  --smoothing-sigmas 1.0x0.0vox \                                                                                         
  --shrink-factors 2x1 \                                                                                                  
  --use-histogram-matching 1 \                                                                                            
  --transform SyN[0.1,3.0,0.0] \                                                                                          
  --metric CC[/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/resampled_template_brain_for_anat/template_brain_for_anat/tpl-MNI152NLin2009cAsym_res-01_desc-brain_T1w_resample.nii.gz,/cbica/projects/RBC/CPACTesting/Pipeline_Timing/Running_problem_testing/Output/CPAC_1.7.1_orig/working/resting_preproc_sub-5347650_ses-1/anat_preproc_niworkflows_ants_0/anat_preproc_niworkflows_ants_0_skullstrip/anat_skullstrip_ants/copy_xform/sub-5347650_acq-VNavNorm_T1w_resample_corrected_masked_xform.nii.gz,1,4] \             
  --convergence [100x70x50x20,1e-06,10] \                                                                                 
  --smoothing-sigmas 3.0x2.0x1.0x0.0vox \                                                                                 
  --shrink-factors 8x4x2x1 \                                                                                              
  --use-histogram-matching 1 \                                                                                            
  --winsorize-image-intensities [0.005,0.995] \                                                                           
  --interpolation LanczosWindowedSinc \                                                                                   
  --output [transform,transform_Warped.nii.gz]

To Reproduce Steps to reproduce the behavior:

  1. Select a pipeline configuration that uses
    skullstrip_option: [niworkflows-ants]

    and/or

    regOption: [ANTS]

    , eg, --preconfig fmriprep-options

  2. Run on Singularity with that pipeline config and some n_cpus and mem_gb
  3. See that mem_gb exceeded by ANTs

Expected behavior

ANTs uses no more than mem_gb memory at a time.

Versions

Additional context

These issues are almost certainly a result of this issue:

Possibly related: https://github.com/FCP-INDI/C-PAC/issues/1054, https://github.com/nipy/nipype/issues/2776


If registration quits suddenly with no error message, memory is often the culprit. Some systems have hard limits and jobs that exceed the limits are killed before being able to throw an exception. If you run with verbose output -v 1, you can see where the error happens. It is often between stages of registration as the images get larger and thus require more memory.

On shared computing platforms, it's often possible to allocate more RAM. Alternatively configure ANTs to use less by using single precision floats for computation, with --float 1.

My registration fails with an error: Memory errors. ANTs wiki


There seems to be no specific memory limitations in the antsRegistration command apart from --float:

$ antsRegistration --help

COMMAND:
     antsRegistration
          This program is a user-level registration application meant to utilize classes
          in ITK v4.0 and later. The user can specify any number of "stages" where a stage
          consists of a transform; an image metric; and iterations, shrink factors, and
          smoothing sigmas for each level. Note that explicitly setting the
          dimensionality, metric, transform, output, convergence, shrink-factors, and
          smoothing-sigmas parameters is mandatory.

OPTIONS:
     --version
          Get Version Information.

     -d, --dimensionality 2/3/4
          This option forces the image to be treated as a specified-dimensional image. If
          not specified, we try to infer the dimensionality from the input image.

     -o, --output outputTransformPrefix
                  [outputTransformPrefix,<outputWarpedImage>,<outputInverseWarpedImage>]
          Specify the output transform prefix (output format is .nii.gz ). Optionally, one
          can choose to warp the moving image to the fixed space and, if the inverse
          transform exists, one can also output the warped fixed image. Note that only the
          images specified in the first metric call are warped. Use antsApplyTransforms to
          warp other images using the resultant transform(s).

     -j, --save-state saveSateAsTransform
          Specify the output file for the current state of the registration. The state
          file is written to an hdf5 composite file. It is specially usefull if we want to
          save the current state of a SyN registration to the disk, so we can load and
          restore that later to continue the next registration process directly started
          from the last saved state. The output file of this flag is the same as the
          write-composite-transform, unless the last transform is a SyN transform. In that
          case, the inverse displacement field of the SyN transform is also added to the
          output composite transform. Again notice that this file cannot be treated as a
          transform, and restore-state option must be used to load the written file by
          this flag.

     -k, --restore-state restoreStateAsATransform
          Specify the initial state of the registration which get immediately used to
          directly initialize the registration process. The flag is mutually exclusive
          with other intialization flags.If this flag is used, none of the
          initial-moving-transform and initial-fixed-transform cannot be used.

     -a, --write-composite-transform 1/(0)
          Boolean specifying whether or not the composite transform (and its inverse, if
          it exists) should be written to an hdf5 composite file. This is false by default
          so that only the transform for each stage is written to file.
          <VALUES>: 0

     -p, --print-similarity-measure-interval <unsignedIntegerValue>
          Prints out the CC similarity metric measure between the full-size input fixed
          and the transformed moving images at each iteration a value of 0 (the default)
          indicates that the full scale computation should not take placeany value greater
          than 0 represents the interval of full scale metric computation.
          <VALUES>: 0

     --write-interval-volumes <unsignedIntegerValue>
          Writes out the output volume at each iteration. It helps to present the
          registration process as a short movie a value of 0 (the default) indicates that
          this option should not take placeany value greater than 0 represents the
          interval between the iterations which outputs are written to the disk.
          <VALUES>: 0

     -z, --collapse-output-transforms (1)/0
          Collapse output transforms. Specifically, enabling this option combines all
          adjacent transforms wherepossible. All adjacent linear transforms are written to
          disk in the forman itk affine transform (called xxxGenericAffine.mat).
          Similarly, all adjacent displacement field transforms are combined when written
          to disk (e.g. xxxWarp.nii.gz and xxxInverseWarp.nii.gz (if available)).Also, an
          output composite transform including the collapsed transforms is written to the
          disk (called outputCollapsed(Inverse)Composite).
          <VALUES>: 1

     -i, --initialize-transforms-per-stage (1)/0
          Initialize linear transforms from the previous stage. By enabling this option,
          the current linear stage transform is directly intialized from the previous
          stage's linear transform; this allows multiple linear stages to be run where
          each stage directly updates the estimated linear transform from the previous
          stage. (e.g. Translation -> Rigid -> Affine).
          <VALUES>: 0

     -n, --interpolation Linear
                         NearestNeighbor
                         MultiLabel[<sigma=imageSpacing>,<alpha=4.0>]
                         Gaussian[<sigma=imageSpacing>,<alpha=1.0>]
                         BSpline[<order=3>]
                         CosineWindowedSinc
                         WelchWindowedSinc
                         HammingWindowedSinc
                         LanczosWindowedSinc
                         GenericLabel[<interpolator=Linear>]
          Several interpolation options are available in ITK. These have all been made
          available. Currently the interpolator choice is only used to warp (and possibly
          inverse warp) the final output image(s).

     -g, --restrict-deformation PxQxR
          This option allows the user to restrict the optimization of the displacement
          field, translation, rigid or affine transform on a per-component basis. For
          example, if one wants to limit the deformation or rotation of 3-D volume to the
          first two dimensions, this is possible by specifying a weight vector of '1x1x0'
          for a deformation field or '1x1x0x1x1x0' for a rigid transformation.
          Low-dimensional restriction only works if there are no preceding
          transformations.All stages up to and including the desired stage must have this
          option specified,even if they should not be restricted (in which case specify
          1x1x1...)

     -q, --initial-fixed-transform initialTransform
                                   [initialTransform,<useInverse>]
                                   [fixedImage,movingImage,initializationFeature]
          Specify the initial fixed transform(s) which get immediately incorporated into
          the composite transform. The order of the transforms is stack-esque in that the
          last transform specified on the command line is the first to be applied. In
          addition to initialization with ITK transforms, the user can perform an initial
          translation alignment by specifying the fixed and moving images and selecting an
          initialization feature. These features include using the geometric center of the
          images (=0), the image intensities (=1), or the origin of the images (=2).

     -r, --initial-moving-transform initialTransform
                                    [initialTransform,<useInverse>]
                                    [fixedImage,movingImage,initializationFeature]
          Specify the initial moving transform(s) which get immediately incorporated into
          the composite transform. The order of the transforms is stack-esque in that the
          last transform specified on the command line is the first to be applied. In
          addition to initialization with ITK transforms, the user can perform an initial
          translation alignment by specifying the fixed and moving images and selecting an
          initialization feature. These features include using the geometric center of the
          images (=0), the image intensities (=1), or the origin of the images (=2).

     -m, --metric CC[fixedImage,movingImage,metricWeight,radius,<samplingStrategy={None,Regular,Random}>,<samplingPercentage=[0,1]>]
                  MI[fixedImage,movingImage,metricWeight,numberOfBins,<samplingStrategy={None,Regular,Random}>,<samplingPercentage=[0,1]>]
                  Mattes[fixedImage,movingImage,metricWeight,numberOfBins,<samplingStrategy={None,Regular,Random}>,<samplingPercentage=[0,1]>]
                  MeanSquares[fixedImage,movingImage,metricWeight,radius=NA,<samplingStrategy={None,Regular,Random}>,<samplingPercentage=[0,1]>]
                  Demons[fixedImage,movingImage,metricWeight,radius=NA,<samplingStrategy={None,Regular,Random}>,<samplingPercentage=[0,1]>]
                  GC[fixedImage,movingImage,metricWeight,radius=NA,<samplingStrategy={None,Regular,Random}>,<samplingPercentage=[0,1]>]
                  ICP[fixedPointSet,movingPointSet,metricWeight,<samplingPercentage=[0,1]>,<boundaryPointsOnly=0>]
                  PSE[fixedPointSet,movingPointSet,metricWeight,<samplingPercentage=[0,1]>,<boundaryPointsOnly=0>,<pointSetSigma=1>,<kNeighborhood=50>]
                  JHCT[fixedPointSet,movingPointSet,metricWeight,<samplingPercentage=[0,1]>,<boundaryPointsOnly=0>,<pointSetSigma=1>,<kNeighborhood=50>,<alpha=1.1>,<useAnisotropicCovariances=1>]
                  IGDM[fixedImage,movingImage,metricWeight,fixedMask,movingMask,<neighborhoodRadius=0x0>,<intensitySigma=0>,<distanceSigma=0>,<kNeighborhood=1>,<gradientSigma=1>]
          These image metrics are available--- CC: ANTS neighborhood cross correlation,
          MI: Mutual information, Demons: (Thirion), MeanSquares, and GC: Global
          Correlation. The "metricWeight" variable is used to modulate the per stage
          weighting of the metrics. The metrics can also employ a sampling strategy
          defined by a sampling percentage. The sampling strategy defaults to 'None' (aka
          a dense sampling of one sample per voxel), otherwise it defines a point set over
          which to optimize the metric. The point set can be on a regular lattice or a
          random lattice of points slightly perturbed to minimize aliasing artifacts.
          samplingPercentage defines the fraction of points to select from the domain. In
          addition, three point set metrics are available: Euclidean (ICP), Point-set
          expectation (PSE), and Jensen-Havrda-Charvet-Tsallis (JHCT).

     -t, --transform Rigid[gradientStep]
                     Affine[gradientStep]
                     CompositeAffine[gradientStep]
                     Similarity[gradientStep]
                     Translation[gradientStep]
                     BSpline[gradientStep,meshSizeAtBaseLevel]
                     GaussianDisplacementField[gradientStep,updateFieldVarianceInVoxelSpace,totalFieldVarianceInVoxelSpace]
                     BSplineDisplacementField[gradientStep,updateFieldMeshSizeAtBaseLevel,<totalFieldMeshSizeAtBaseLevel=0>,<splineOrder=3>]
                     TimeVaryingVelocityField[gradientStep,numberOfTimeIndices,updateFieldVarianceInVoxelSpace,updateFieldTimeVariance,totalFieldVarianceInVoxelSpace,totalFieldTimeVariance]
                     TimeVaryingBSplineVelocityField[gradientStep,velocityFieldMeshSize,<numberOfTimePointSamples=4>,<splineOrder=3>]
                     SyN[gradientStep,<updateFieldVarianceInVoxelSpace=3>,<totalFieldVarianceInVoxelSpace=0>]
                     BSplineSyN[gradientStep,updateFieldMeshSizeAtBaseLevel,<totalFieldMeshSizeAtBaseLevel=0>,<splineOrder=3>]
                     Exponential[gradientStep,updateFieldVarianceInVoxelSpace,velocityFieldVarianceInVoxelSpace,<numberOfIntegrationSteps>]
                     BSplineExponential[gradientStep,updateFieldMeshSizeAtBaseLevel,<velocityFieldMeshSizeAtBaseLevel=0>,<numberOfIntegrationSteps>,<splineOrder=3>]
          Several transform options are available. The gradientStep or learningRate
          characterizes the gradient descent optimization and is scaled appropriately for
          each transform using the shift scales estimator. Subsequent parameters are
          transform-specific and can be determined from the usage. For the B-spline
          transforms one can also specify the smoothing in terms of spline distance (i.e.
          knot spacing).

     -c, --convergence MxNxO
                       [MxNxO,<convergenceThreshold=1e-6>,<convergenceWindowSize=10>]
          Convergence is determined from the number of iterations per level and is
          determined by fitting a line to the normalized energy profile of the last N
          iterations (where N is specified by the window size) and determining the slope
          which is then compared with the convergence threshold.

     -s, --smoothing-sigmas MxNxO...
          Specify the sigma of gaussian smoothing at each level. Units are given in terms
          of voxels ('vox') or physical spacing ('mm'). Example usage is '4x2x1mm' and
          '4x2x1vox' where no units implies voxel spacing.

     -f, --shrink-factors MxNxO...
          Specify the shrink factor for the virtual domain (typically the fixed image) at
          each level.

     -u, --use-histogram-matching
          Histogram match the images before registration.

     -l, --use-estimate-learning-rate-once
          turn on the option that lets you estimate the learning rate step size only at
          the beginning of each level. * useful as a second stage of fine-scale
          registration.

     -w, --winsorize-image-intensities [lowerQuantile,upperQuantile]
          Winsorize data based on specified quantiles.

     -x, --masks [fixedImageMask,movingImageMask]
          Image masks to limit voxels considered by the metric. Two options are allowed
          for mask specification: 1) Either the user specifies a single mask to be used
          for all stages or 2) the user specifies a mask for each stage. With the latter
          one can select to which stages masks are applied by supplying valid file names.
          If the file does not exist, a mask will not be used for that stage. Note that we
          handle the fixed and moving masks separately to enforce this constraint.

     --float
          Use 'float' instead of 'double' for computations.
          <VALUES>: 0

     --minc
          Use MINC file formats for transformations.
          <VALUES>: 0

     --random-seed seedValue
          Use a fixed seed for random number generation. By default, the system clock is
          used to initialize the seeding. The fixed seed can be any nonzero int value.

     -v, --verbose (0)/1
          Verbose output.

     -h
          Print the help menu (short version).

     --help
          Print the help menu. Will also print values used on the current command line
          call.
          <VALUES>: 1

antsJointLabelFusion.sh relies on sbatch, rev and cut to limit memory:

id=`sbatch --job-name=antsJlfReg${i} --export=ANTSPATH=$ANTSPATH $QSUB_OPTS --nodes=1 --cpus-per-task=1 --time=${REGISTRATION_WALLTIME} --mem=${REGISTRATION_MEMORY} $qscript | rev | cut -f1 -d\ | rev`
jobIDs="$jobIDs $id"
sleep 0.5

Other leads: maybe use LegacyMultiProc, https://github.com/nipreps/fmriprep/issues/836, https://github.com/nipreps/fmriprep/pull/839, https://github.com/nipreps/fmriprep/pull/854, https://github.com/nipy/nipype/pull/2284 (maxtasksperchild 1? https://nipype.readthedocs.io/en/latest/api/generated/nipype.pipeline.plugins.legacymultiproc.html), https://github.com/nipy/nipype/issues/2548, https://github.com/nipy/nipype/pull/2773

shnizzedy commented 3 years ago

I don't know enough to know how much memory is reasonable, but I hear

The run only has one HBN subject (1 anat, 1 func), the anat template is 1mm, the anat image is 1mm iso. The func has 750 volume. But it still makes no sense it can use 45 gb memory!

shnizzedy commented 3 years ago

There are some clear memory spikes for ANTs when it runs on Brainlife (this is one subject with fmriprep-options):

runtime

shnizzedy commented 3 years ago

I'm not sure I quite understand resource_monitor.json (particularly the time and cpus fields), but if the indices of each key correspond to one another, here are the maximum entries for rss_GiB, vms_GiB: and cpus for the run above on brainlife

[{
    "name": "resting_preproc_sub-A00013809_ses-DS2.nuisance_regressor_0_0.aCompCor_cosine_filter",
    "time": 1605723308.305644,
    "rss_GiB": 36.45994949316406,
    "cpus": 103.0,
    "vms_GiB": 40.09559631347656,
    "interface": "Function",
    "params": "_scan_rest_acq-645__selector_WM-2mm-M_CSF-2mm-M_tC-5PCT2-PC5_aC-CSF+WM-2mm-PC5_G-M_M-SDB_P-2_BP-B0.01-T0.1",
    "mapnode": 0
}, {
    "name": "resting_preproc_sub-A00013809_ses-DS2.nuisance_regressor_0_0.aCompCor_cosine_filter",
    "time": 1605723285.300852,
    "rss_GiB": 28.016590118164064,
    "cpus": 832.8,
    "vms_GiB": 40.09559631347656,
    "interface": "Function",
    "params": "_scan_rest_acq-645__selector_WM-2mm-M_CSF-2mm-M_tC-5PCT2-PC5_aC-CSF+WM-2mm-PC5_G-M_M-SDB_P-2_BP-B0.01-T0.1",
    "mapnode": 0
}, {
    "name": "resting_preproc_sub-A00013809_ses-DS2.nuisance_regressor_0_0.aCompCor_cosine_filter",
    "time": 1605723284.15847,
    "rss_GiB": 27.693824767578125,
    "cpus": 2742.9,
    "vms_GiB": 31.022430419921875,
    "interface": "Function",
    "params": "_scan_rest_acq-645__selector_WM-2mm-M_CSF-2mm-M_tC-5PCT2-PC5_aC-CSF+WM-2mm-PC5_G-M_M-SDB_P-2_BP-B0.01-T0.1",
    "mapnode": 0
}]
shnizzedy commented 3 years ago

Another clue, interacting with the graphs on brainlife, I see it's actually run.py that's eating all the memory (that is, it's C-PAC and not a child process)!

with `run.py` ![with `run.py`](https://user-images.githubusercontent.com/5974438/101075226-b9d02500-356f-11eb-8be4-222cfc836bcf.png)
without `run.py` ![without `run.py`](https://user-images.githubusercontent.com/5974438/101075250-c48aba00-356f-11eb-8d52-47153daa88dd.png)
hahaai commented 3 years ago

Those brief spikes are huge!!! Do you also mean that it probably is the run.py, not the ANTs?

shnizzedy commented 3 years ago

Not sure. I think it's probably how C-PAC is allocating memory for ANTs

shnizzedy commented 3 years ago

After @ccraddock explained to me that

I started to set mem_gb where the observed runtime_memory_gb was more than double the estimate (in all cases the estimate was left to the default 0.2) and saw a marked improvement.

virtual memory usage plot

There are still some spikes, but they're much less dramatic. I'm iterating now, setting estimates based on logged memory usage after fresh runs after setting more estimates.

ccraddock commented 3 years ago

This is looking good. I think that there is an issue with the number of threads being estimated by the callback, or the gantt chart creation script is pulling in the wrong numbers. Some of the nodes are reporting using 210 threads!

As for your earlier comment on run.py, I think that since this is the parent process, it 'owns' all of the memory used by the child threads. So the amount of memory attributed to it should be the cumulative amount of memory used by all of the nodes that are currently executing.

shnizzedy commented 3 years ago

I thought maybe runtime_threads was counting something different than I expected.

I see the profile uses cpu_percent for runtime_threads which returns a percentage of a CPU, so I think something like math.ceil(cpu_percent)/100 would be an estimate of the number of threads, but there's some disconnected code that looks like it collects the actual number of threads used (as opposed to percentage of 1 CPU).

I'll try updating the callback to get the actual number of threads, run a few C-PAC runs and see how it looks.

shnizzedy commented 3 years ago

After a little tinkering, I think estimating the number of threads (by dividing by cpu_percent 100 and rounding up) is good enough for what I'm trying to do.

callback.log.html screenshot

I'm putting a PR in to Nipype to restore the Gantt chart creation capabilities, and I'll note the thread-logging ambiguity there.

sgiavasis commented 3 years ago

Awesome @shnizzedy !!

shnizzedy commented 3 years ago

This looks like it's working for memory.

memory plot

I'm going through now and adding n_procs similarly to control the number of threads

threads plot

and adding to the developer docs as I go