FCP-INDI / C-PAC

Configurable Pipeline for the Analysis of Connectomes
https://fcp-indi.github.io/
GNU Lesser General Public License v3.0
62 stars 40 forks source link

✨ Create a feedback-loop for resource estimates #1688

Closed gkiar closed 2 years ago

gkiar commented 2 years ago

Related problem

Related to #1684 by @gkiar

Proposed feature

Since the pipeline is automagically recording memory usage, we have the ability to estimate the usage of pipelines quite precisely once they have been run a single time on similar data.

The proposal is to support the addition of a command-line flag that takes in a callback.log file generated by a previous run of the identical config (and data organization), and uses that to improve resource estimates. There could be a second argument that contains a scaling factor, but naively I would suggest inflating experienced memory usage by 10–15%.

In practice, I would imagine the following workflow:

  1. I take 1 subject of my dataset that is similar to all the others with respect to both the files present and their contents (e.g., representative number of runs, scan durations)
  2. I process that subject using a config of my choosing and providing generous resource totals to ensure that the job can be completed using built-in estimates of consumption.
  3. (optional) I parse the log file to get a sense of the true resource consumption
  4. With future runs, I provide a --runtime_usage argument with a path to my callback.log, which is used to scale each nipypye.node's RAM limits and improve runtime efficiency.

Acceptance criteria

Alternatives

Once this infrastructure exists, it will be great to think about how we can integrate these with e-telemetry/tracking, so that we can get detailed and diverse logs of resource usage that can inform better in-place estimates of resource consumption over time.

Additional context

No response

gkiar commented 2 years ago

Here are some example callback.log files and the associated configs that can be used for testing callback_minimal.log was generated by:

%YAML 1.1
---
# CPAC Pipeline Configuration YAML file
# Version 1.8.3
#
# http://fcp-indi.github.io for more info.
#
# Pipeline config "Minimal", version GUI-0
# Tue Feb 22 2022 11:52:08 GMT-0500 (Eastern Standard Time)
#
# Tip: This file can be edited manually with a text editor for quick modifications.

FROM: preproc

pipeline_setup:
  pipeline_name: minimal
  system_config:
    raise_insufficient: Off
    random_seed: 77742777

callback_minimal_plus_connectomes.log was generated by:

%YAML 1.1
---
# CPAC Pipeline Configuration YAML file
# Version 1.8.3
#
# http://fcp-indi.github.io for more info.
#
# Pipeline config "Minimal", version GUI-0
# Tue Feb 22 2022 11:52:08 GMT-0500 (Eastern Standard Time)
#
# Tip: This file can be edited manually with a text editor for quick modifications.

FROM: preproc

pipeline_setup:
  pipeline_name: minimal-preproc-plus-networks
  system_config:
    raise_insufficient: Off
    random_seed: 77742777

timeseries_extraction:
  run: true
  tse_roi_paths:
    /ndmg_atlases/label/Human/Yeo-17_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/DKT_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Yeo-7_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Glasser_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Desikan_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Brodmann_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Schaefer400_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Schaefer300_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Schaefer1000_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
    /ndmg_atlases/label/Human/Schaefer200_space-MNI152NLin6_res-1x1x1.nii.gz: Avg
  realignment: ROI_to_func
  connectivity_matrix:
    using:
      - Nilearn
    measure:
      - Pearson
      - Partial
gkiar commented 2 years ago

For recreating and interrogating, the data (input and output) can be found on Bridges-2 at:

/jet/home/gkiar/shared_data/NKI-RS/dataset_4subs/   # Input
/jet/home/gkiar/shared_data/NKI-RS/dataset_4subs/derivatives_minimal_ieee/eval-{0,1}  # Outputs, for the two configs.
shnizzedy commented 2 years ago

Resolved in #1701