Open shnizzedy opened 3 years ago
MM suggested using [1200, 600, 300, 150, 75] TRs × a subject from [HCP, HBN, HNU, NYU test/retest] to generate the initial estimates to generate the formula for estimating.
The threading issue I'm working on currently is dealing with is making memory estimates data-dependent.
Before 1.8, we were mostly relying on the default per-node memory estimate of 0.2 GB; in 1.8 we raised the default to 2.0 GB and set some higher estimates based on observed memory usage. These estimates are still just flat estimates that don't allocate well for varying data shapes.
For example, a BOLD image with dimensions 90 × 104 × 72 × 300 uses about 12.5 GB for apply_warp_warp_ts_to_T1template (to a 256 × 320 × 320 T1 image). One with 90 × 104 × 72 × 1200 uses about 50 GB for the same node. Our current estimate for that node is 10 GB. An overrun of 2.5 GB isn't great, and one of 40 GB is rough, but with a powerful enough system and soft limits will still complete.
The issue is really a problem when multiple nodes that don't allocate enough memory try to run at the same time.
The particular pipeline configuration that raised the issue here has a bunch of forks (and the data have several functional images per subject, like
├── anat │ └── sub-105923_ses-retest_T1w.nii.gz └── func ├── sub-105923_ses-retest_task-restLRbeltrvt_run-1_bold.json ├── sub-105923_ses-retest_task-restLRbeltrvt_run-1_bold.nii.gz ├── sub-105923_ses-retest_task-restLRbeltrvt_run-2_bold.json ├── sub-105923_ses-retest_task-restLRbeltrvt_run-2_bold.nii.gz ├── sub-105923_ses-retest_task-restLRpredrvt_run-1_bold.json ├── sub-105923_ses-retest_task-restLRpredrvt_run-1_bold.nii.gz ├── sub-105923_ses-retest_task-restLRpredrvt_run-2_bold.json ├── sub-105923_ses-retest_task-restLRpredrvt_run-2_bold.nii.gz ├── sub-105923_ses-retest_task-restLR_run-1_bold.json ├── sub-105923_ses-retest_task-restLR_run-1_bold.nii.gz ├── sub-105923_ses-retest_task-restLR_run-2_bold.json └── sub-105923_ses-retest_task-restLR_run-2_bold.nii.gz
).
I ran the subject above single-threaded with a single functional image 3 times (in full (1200 timepoints), truncated to 600 timepoints and truncated to 300 timepoints) to get the
callback.log
s and make the hungriest nodes adjust estimates based on the number of timepoints, to get this particular set of runs going. As a follow-up I plan to
- run the same configuration with images with other spatial resolutions
- run other configurations
to better tune the memory estimation formulas.
In the back of my mind, I have a lingering concern that there's something screwy with the way nipype is allocating/monitoring threads beyond how the log is reporting the number of threads. But I know the memory overrun issue is real, so I'm starting there.
― me, in an email
Per @anibalsolon's suggestion, I'm trying to override Node
's mem_gb
@property
to dynamically consider input files: https://github.com/FCP-INDI/C-PAC/commit/0626a669c2afb9a116e4777860e22d794c1610cd
For the first iteration, I'm only considering the time dimension as variable. Once that works, I'll refactor to include x, y, and z dimensions and see if I can figure out a way to simplify the paramaterization. Currently,
for a memory estimate of 0.4 + 0.0033 * {number of timepoints in 'in_file'}
.
It would of course be better to
'in_file'
instead of the whole lambda **kwargs: kwargs['in_file']
every timeFor now, I just plugged in a couple of these and kicked off a couple runs to test the proof of concept.
Related problem
Originally posted by @shnizzedy in https://github.com/FCP-INDI/C-PAC/issues/1479#issuecomment-812185689
Related #1166, #1301, #1404, #1453
Proposed feature
Starting with the memory-hungriest nodes (those in the table above),
Additional context
Some of these estimates could potentially get complicated as the pipeline progresses through transforms and resamplings.