bids-apps / MRtrix3_connectome

Generate subject connectomes from raw BIDS data & perform inter-subject connection density normalisation, using the MRtrix3 software package.
http://www.mrtrix.org/
Apache License 2.0
48 stars 26 forks source link

Upper limit on memory usage? #102

Closed bwinsto2 closed 2 years ago

bwinsto2 commented 2 years ago

Dear Robert, I'm inquiring whether there is an option to limit memory usage for a run of this software similar to how fmriprep/qsiprep have the "mem-mb" argument. I am attempting to run subjects in parallel on a cluster and it will not work unless I limit the resources to below 24G of RAM per job/subject. If you haven't had the time to include this option, are you aware of any workarounds, or whether this is easily implementable?

Cheers, Brian

Lestropie commented 2 years ago

Hi Brian,

It's not a trivially extensible thing unfortunately. It's actually kind of a rare thing to have a processing task that allows you to directly modulate how much memory is used, it's more common that any particular task simply allocates whatever amount of memory is necessary for its operation and fails if it can't do so. I'm not sure what mechanism those packages are using to restrict memory usage; e.g. I could imagine reducing the number of threads invoked in order to limit total memory usage, but my suspicion is that it's all application-specific.

In the context of this particular tool, the peak memory usage should come from the SIFT2 step. For that particular step there's already code in place to detect when a failure occurs due to not having sufficient RAM, and re-attempting that step using a reduced number of streamlines. So to me the logical first step in providing a memory usage limitation capability here would be that if the user pre-specifies a maximal amount of RAM to use, predict the maximal number of streamlines to generate that would result in SIFT2 using no more than that amount of memory. This would only be an estimate, the actual amount of RAM necessary wouldn't be known until an attempt to actually allocate that memory is made, but it might serve the purpose you're looking for?

Rob

bwinsto2 commented 2 years ago

Thanks very much, Rob. I am running a bunch of subjects in parallel and have provided each with 48G of RAM and 8 cores. Haven't seen any issues yet although all the subjects are still at dwifslpreproc since Sept. 22nd (but I know TOPUP and eddy_openmp take a long time). I am only running the preproc arm of the pipeline, so I shouldn't run into SIFT2, I imagine. Will reopen if anything comes up. Thanks again.

Lestropie commented 2 years ago

topup can indeed take a long time if there's a lot of volumes or a very high spatial resolution.

Are you running within or outside of a container environment? I'm aware that FSL 6.0.5 has been released, and based on the changelog and the current code here it's possible that the CPU implementation of slice-to-volume correction could conceivably be getting engaged, which could take a very long time to run; but that could only happen if you're running outside of a container and have yourself activated a FSL 6.0.5 module.

48GB should be heaps for preproc. Nevertheless if you're using SLURM or similar you should be able to query peak memory usage of the job based on its ID.

bwinsto2 commented 2 years ago

It is HCP data, so I expect it will take a while. Another handful of days and I might start to get worried. I am running through a Singularity container.

Cheers, Brian

Lestropie commented 2 years ago

I'd be investing the effort in getting the Singularity container working on a GPU-enabled node and utilising the slice-to-volume motion correction if I were you. At that spatial resolution the within-volume motion will be non-negligible.

I had someone trying to redo the pre-processing of HCP raw data and we ended up giving up at the time because we couldn't get close enough to reproducing the provided minimally-processed data. Alternatively, if you're providing the minimally-preprocessed HCP data to the preproc analysis level of this tool, there's a more fundamental misunderstanding.

bwinsto2 commented 2 years ago

Thanks for the thoughts. Unfortunately our cluster has hardly any GPU node availability, so I think we will have to make do. We ran our analysis on one subject and compared the results of using HCP preprocessed data versus HCP raw data taken through mrtrix3_connectome and a couple other BIDS apps, and the results were pretty similar. In the future we will definitely try to make use of eddy_cuda, however.

bwinsto2 commented 2 years ago

Hi Rob, since topup still hasn't finished on one HCP unprocessed subject (started on Sept. 22nd), I'm trying to troubleshoot. Maybe it has to do with the fact that there are 6 DWIs that it is concatenating? This is the BIDS dwi directory:

Screen Shot 2021-10-09 at 10 56 19 AM

Do you think maybe I should ask the FSL folks? Not sure why it's taking so long.

I'm allocating 8 CPUs and 48 GB of memory.

Thanks.

Lestropie commented 2 years ago

It's not the concatenation of DWIs specifically, it's the combination of number of b=0's (18) and high spatial resolution. topup is single-threaded, so can't make use of the cores you are offering to it.

It would not surprise me if HCP used only a subset of b=0 volumes for inhomogeneity field estimation. Indeed I observed during my own processing of late that when using a monopolar diffusion encoding, there are non-negligible residual eddy currents in the interspersed b=0 volumes despite not themselves containing any diffusion sensitisation gradients. topup assumes that the only differences between volumes are EPI distortions and rigid-body motion, which is violated if there are differential eddy currents. For this reason, when the input data are detected as coming from monopolar encoding, the tool only uses those b=0 volumes that appear prior to the first b>0 volume. I would suggest including that field in your DWI JSONs. It won't reduce your execution time by an order of magnitude, but it will reduce it somewhat.

The long execution times of topup are well-documented, so I would suggest finding and reading what you can. You could also try to find out whether specifically HCP did anything different in terms of how topup was utilised in the production of the minimally-processed data.

bwinsto2 commented 2 years ago

Thanks so much, Rob. Just wanted to let you know that this is working now. Topup takes about 5 days to run. I didn't change the diffusion encoding header, but the info you provided is interesting. The problem (I think) was that hcp2bids had set the TotalReadoutTime field to 0.6 for some reason. I recalculated the value (it should be something like 0.11). When I reran after making that change topup did indeed finish.

Thanks again for the support.

Lestropie commented 2 years ago

Okay, that's interesting: normally topup crashes out if any total readout time is 0.2 or greater, but that should propagate up to the whole app crashing out, not freezing.

I'd still suggest looking at the raw input b=0 volumes and deciding whether or not the interleaved ones are too heavily corrupted by eddy currents to be used for inhomogeneity field estimation. Given the magnitude of the issues that this was causing for me, it wouldn't surprise me if the HCP unprocessed data have the same problem.