Mouse-Imaging-Centre / RMINC

Statistics for MINC volumes: A library to integrate voxel-based statistics for MINC volumes into the R environment. Supports getting and writing of MINC volumes, running voxel-wise linear models, correlations, etc.; correcting for multiple comparisons using the False Discovery Rate, and more. With contributions from Jason Lerch, Chris Hammill, Jim Nikelski and Matthijs van Eede. Some additional information can be found here:
https://mouse-imaging-centre.github.io/RMINC
Other
22 stars 17 forks source link

Possible memory leak in mincWriteVolume #302

Open gdevenyi opened 2 years ago

gdevenyi commented 2 years ago

We're running the following code on Niagara:

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
# Load packages (not all are necessary)
library(lme4)
library(lmerTest)
library(RMINC)
library(tidyverse)

# set working directory 
setwd("/scratch/m/mchakrav/paulbest/genfi_df5/dbm/dbm_4/linear_model")

#load data 

data <- read_csv("pls_nona_nia.csv")

model1<-mincLmer(jacobians ~  time_month + (time_month|Id),
data = data,
mask="secondlevel_otsumask.mnc",
summary_type = "ranef",
parallel=c("local", 20),
control=lmerControl(optimizer ="Nelder_Mead"))

save(model1, file = "output_lmer/model1.RData")
save.image(file = "complete_model.RData")

unique_id=unique(data['Id'])
for(i in 1:nrow(unique_id))
{         
  id=unique_id[i,1] 
  column=paste0("beta-time_month-Id", id)
  dir.create(file.path(paste0("output_lmer/",id,"/")), showWarnings = FALSE)
  output_minc_file<-paste0("output_lmer/",id,"/",id,"_time_month_Id_beta.mnc")
  mincWriteVolume(model1,output.filename=output_minc_file,like.filename="secondlevel_template0.mnc",column = column)
  }          

And seeing the following memory performance:

Screen Shot 2022-01-19 at 6 18 06 PM

During this time, we loop through and get ~29 files saved, before the system runs out of memory and kills R.

we're not creating any new memory-holding objects as far as I understand, but R memory consumption rises and eventually fills the node.

bcdarwin commented 2 years ago

Is VOLUME_CACHE_THRESHOLD set ?

gdevenyi commented 2 years ago

The minc-toolkit default environment sets it as:

VOLUME_CACHE_THRESHOLD=-1
bcdarwin commented 2 years ago

Hmm, quite possible it is a memory leak in the C bindings or similar.

Is this memory measurement the whole machine or your R process? If the latter, do you know if the dip at 3.30pm corresponds to the beginning of writing files?

Do I have access to your modules/files on Niagara? If so I could try to debug by running under Valgrind but I'm pretty unfamiliar with RMINC internals so might be challenging.

gdevenyi commented 2 years ago

Is this memory measurement the whole machine or your R process?

This is the Niagara readout from the slurm whole-machine statistics

If the latter, do you know if the dip at 3.30pm corresponds to the beginning of writing files? That's a really good question, that dip is pretty big and all the allocations should be done at that point. I'm not sure.

Do I have access to your modules/files on Niagara?

Yes

export QUARANTINE_PATH=/project/m/mchakrav/quarantine
module use ${QUARANTINE_PATH}/modules
module load cobralab

For now, we're addressing this by randomizing the list of files to write out and repeating the job so we'll eventually get them all.

bcdarwin commented 2 years ago

Just as a quick note -- as a better workaround than randomizing, you could probably run mincWriteVolume from short-lived subprocesses e.g. using batchMap from batchtools with local multiprocessing backend.

bcdarwin commented 2 years ago

Do I have access to your modules/files on Niagara?

Yes

export QUARANTINE_PATH=/project/m/mchakrav/quarantine
module use ${QUARANTINE_PATH}/modules
module load cobralab

For now, we're addressing this by randomizing the list of files to write out and repeating the job so we'll eventually get them all.

Thanks. Is there any chance you could give me read access (via extended ACLs, say) to the data directory as well?

gdevenyi commented 2 years ago

I have a special share for that,

/scratch/m/mchakrav/share

You have read-write there. Data is still copying. I suggest ~30 minutes wait.

bcdarwin commented 2 years ago

Are the jacobians being copied as well ?

gdevenyi commented 2 years ago

Are the jacobians being copied as well ?

Yes. Warning: the file paths are all absolute path. Student will be talked to.