ANTsX / ANTsR

R interface to the ANTs biomedical image processing library
https://antsx.github.io/ANTsR
Apache License 2.0
127 stars 35 forks source link

antsRegistration crashing with "memory not mapped" error #355

Closed muratmaga closed 3 years ago

muratmaga commented 3 years ago

We are trying to process some data from Mouse Phenotyping project. It involves transfering a segmented atlas to new samples. Unprocessed images do not have the correct voxel spacing, so we use an initial lm transform to scale them correctly using five points and then proceed other registration.

For some of the dataset this pipeline works fine, and for some it results at the syn2 with this error. Verbose mode doesn't provide anymore information than this. I know that I do did not resample the syn1 transforms to the higher resolution, but this clearly doesn't cause an issue for some of the samples and for this specific sample I did resample the warp field to high resolution, and still it crashed.

At this point I don't really know how can I debug this further. I appreciate any suggestions, I am providing a link to the zip archive for the sample data that reproduces the problem.

caught segfault address 0x2aac22bbc970, cause 'memory not mapped'

Traceback: 1: antsRegistration(fixed = ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose = FALSE)

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

if (!require(SlicerMorphR)) devtools::install_github("SlicerMorph/SlicerMorphR") 
library(patchMatchR)
library(ANTsR)

#path to zip archive
setwd("~/Documents/memory_map/Data")

set.seed(5)
ref.lm = read.markups.json("Atlas_lms.mrk.json")
ref.img = antsImageRead("final.nrrd")
ref.img = iMath(ref.img, "Normalize")
ref.img.low = resampleImage(ref.img, c(0.054, 0.054, 0.054))
ref.label = antsImageRead("Embryo_Atlas-labels.nrrd")

tmp.img = antsImageRead('Female_ABCM_K1403-70-e15.5_baseline.nrrd')
tmp.img = iMath(tmp.img, "Normalize")
tmp.img.low = resampleImage(tmp.img, antsGetSpacing(tmp.img)*2)
tmp.lm = read.markups.json('Female_ABCM_K1403-70-e15.5_baseline_lms.mrk.json')

lm.tx = fitTransformToPairedPoints(fixedPoints = ref.lm, 
                                     movingPoints = tmp.lm,
                                     transformType = "Similarity", 
                                     domainImage = ref.img)

  affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform)
  syn1 = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "SyN", initialTransform = affine$fwdtransforms)

  #at this point resultant registration seems reasonable:

  new.labels = antsApplyTransforms(fixed=tmp.img, moving = ref.label, 
                            transformlist = syn1$invtransforms, 
                            interpolator = "genericLabel")

  plot(tmp.img, new.labels, nslices=20, alpha=0.8)

  #crashes here                          
  syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose=TRUE)

edit. This is the sessionInfo(), if it is useful:

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] jsonlite_1.7.2 patchMatchR_1.0 ANTsR_0.5.7.5 ANTsRCore_0.7.5 SlicerMorphR_0.0.0.9000

loaded via a namespace (and not attached): [1] Rcpp_1.0.3 whisker_0.4 magrittr_1.5 docopt_0.6.1 lattice_0.20-38 R6_2.4.0 tools_3.6.0
[8] grid_3.6.0 keras_2.2.5.0 tfruns_1.4 abind_1.4-5 RcppEigen_0.3.3.5.0 tensorflow_2.0.0 Matrix_1.2-17
[15] base64enc_0.1-3 ANTsRNet_1.1 zeallot_0.1.0 slam_0.1-46 qlcMatrix_0.9.7 compiler_3.6.0 ITKR_0.5.3.3
[22] generics_0.0.2 sparsesvd_0.2 reticulate_1.13 mvtnorm_1.0-11

dorianps commented 3 years ago

Looks to me like you are out of RAM (or whatever RAM WSL thinks has available). How much memory do you have on the machine? If you are using WSL (or WSL2), can you check how much RAM and swap disk space you have? I used to increase those, and even increase the Windows virtual memory to make things work.

On Fri, Oct 1, 2021 at 12:38 PM muratmaga @.***> wrote:

We are trying to process some data from Mouse Phenotyping project. It involves transfering a segmented atlas to new samples. Unprocessed images do not have the correct voxel spacing, so we use an initial lm transform to scale them correctly using five points and then proceed other registration.

For some of the dataset this pipeline works fine, and for some it results at the syn2 with this error. Verbose mode doesn't provide anymore information than this. I know that I do did not resample the syn1 transforms to the higher resolution, but this clearly doesn't cause an issue for some of the samples and for this specific sample I did resample the warp field to high resolution, and still it crashed.

At this point I don't really know how can I debug this further. I appreciate any suggestions, I am providing a link to the zip archive for the sample data that reproduces the problem. https://app.box.com/s/grtsdi0w2edbn0ptw834l8vovd2v8n9t

caught segfault address 0x2aac22bbc970, cause 'memory not mapped'

Traceback: 1: antsRegistration(fixed = ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose = FALSE)

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

if (!require(SlicerMorphR)) devtools::install_github("SlicerMorph/SlicerMorphR") library(patchMatchR) library(ANTsR)

path to zip archive

setwd("~/Documents/memory_map/Data")

set.seed(5) ref.lm = read.markups.json("Atlas_lms.mrk.json") ref.img = antsImageRead("final.nrrd") ref.img = iMath(ref.img, "Normalize") ref.img.low = resampleImage(ref.img, c(0.054, 0.054, 0.054)) ref.label = antsImageRead("Embryo_Atlas-labels.nrrd")

tmp.img = antsImageRead('Female_ABCM_K1403-70-e15.5_baseline.nrrd') tmp.img = iMath(tmp.img, "Normalize") tmp.img.low = resampleImage(tmp.img, antsGetSpacing(tmp.img)*2) tmp.lm = read.markups.json('Female_ABCM_K1403-70-e15.5_baseline_lms.mrk.json')

lm.tx = fitTransformToPairedPoints(fixedPoints = ref.lm, movingPoints = tmp.lm, transformType = "Similarity", domainImage = ref.img)

affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform) syn1 = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "SyN", initialTransform = affine$fwdtransforms)

at this point resultant registration seems reasonable:

new.labels = antsApplyTransforms(fixed=tmp.img, moving = ref.label, transformlist = syn1$invtransforms, interpolator = "genericLabel")

plot(tmp.img, new.labels, nslices=20, alpha=0.8)

crashes here

syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose=TRUE)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ANTsX/ANTsR/issues/355, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFJU7PELKVSO6HCHN23EULUEXPXDANCNFSM5FFCITSA .

muratmaga commented 3 years ago

This is on a physical Centos7 machine. RAM does not come close to running out (there are 256GB available on these nodes). When completes, R uses about 15-18GB tops for these registration operations.

dorianps commented 3 years ago

Ehm, not sure what is going on. Let's see what others say. I have a system very similar to yours and could try your commands later if needed.

On Fri, Oct 1, 2021, 1:09 PM muratmaga @.***> wrote:

This is on a physical Centos7 machine. RAM does not come close to running out (there are 256GB available on these nodes). R, when completes, uses about 15-18GB tops.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ANTsX/ANTsR/issues/355#issuecomment-932403347, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFJU7PYSECM5J7CBYUZKSLUEXTMTANCNFSM5FFCITSA .

muratmaga commented 3 years ago

Ehm, not sure what is going on. Let's see what others say. I have a system very similar to yours and could try your commands later if needed.

When you have the chance, that will be great. It seems to replicate for me on different systems.

muratmaga commented 3 years ago

I found a system with R 4.0.5 with similar versions, and can't seem to replicate. So I will use the newer R.

muratmaga commented 3 years ago

Spoke too soon. Same error on 4.0.5 Open to any other suggestions.

stnava commented 3 years ago

have you run this same thing single threaded?

muratmaga commented 3 years ago

I can try but will take very long. It is through Sys.setenv("ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1") right?

stnava commented 3 years ago

yes

stnava commented 3 years ago

at the very beginning of the script - or in the envt

muratmaga commented 3 years ago

Single threaded crashed on R 3.6.0 with the same error, now trying 4.0.5.

muratmaga commented 3 years ago

This is still running (it has been about 12hs). Normally at the syn2, if it crashes, it does it so early on. With single threaded, it is hard to tell where we are. Assuming this will work, what else can be done? Single-threaded is too slow the samples we have, will probably a day (or possibly longer) to process a single sample.

SlicerMorph commented 3 years ago

Confirmed that single threaded version finished successfully after 4 days.

cookpa commented 3 years ago

In the multi-threaded context, how many threads is it trying to use? In other words, what happens if you print

Sys.getenv("ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS")

SlicerMorph commented 3 years ago

These are private compute nodes, so it is using maximum number of threads available on the node. Depending on the node it is running, this varies from 24 to 48.

cookpa commented 3 years ago

OK, that is a lot of threads, it might be hitting a system memory limit either on the machine itself (ulimit -v) or by a job scheduler.

I would try running with 8 threads. If that works consistently you could try increasing to 16. I often don't see much improvement beyond 8, but that might be data dependent.

SlicerMorph commented 3 years ago

We are not using any scheduler itself right now. We manually (and directly) run the scripts on the compute node(s). SO we can rule out the job scheduler.

I can try with 8 threads. But for clarification, we have been doing this (running maximum number of threads in ANTsR) for a while for different datasets. This is the first time we encountered this memory not mapped issue.

dorianps commented 3 years ago

I agree that knowing the number of cpu cores and containing the threads used by ANTs is the best approach. Running the maximum threads available can be confusing; the number of threads depend if hyperthreading is enabled (48 cpu cores can become 96 available threads). And if you go on a node outside the job scheduler, maybe the scheduler keeps sending jobs of other people to that node which can conflict.

On my 32 core system (with hyperthreading enabled) and 250Gb memory, I would run 30 registration jobs, each with 2 threads, to exploit the resources in full. You can also check how many threads and memory a single registration job using top or htop.

On Wed, Oct 6, 2021 at 2:13 PM SlicerMorph Project @.***> wrote:

We are not using any scheduler itself right now. We manually (and directly) run the scripts on the compute node(s). SO we can rule out the job scheduler.

I can try with 8 threads. But for clarification, we have been doing this (running maximum number of threads in ANTsR) for a while for different datasets. This is the first time we encountered this memory not mapped issue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ANTsX/ANTsR/issues/355#issuecomment-936827018, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFJU7NUOXPSDJGT2MSFE33UFSGUPANCNFSM5FFCITSA .

SlicerMorph commented 3 years ago

Thanks for the feedback. We already know the number of threads and cores on the compute nodes, which is set by the environmental variable for each compute node through:

export ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=cat /proc/cpuinfo | grep processor | wc -l

We have three types of compute nodes with 12, 20 and 24 physical cores and hyperthreading is enabled on all of them (so the reported threads are twice as many of the physical cores). Hence the reason for setting the variable like this.

We don't use scheduler (for this particular case), because it is a private system. Nobody else executes anything on them (apart from baselien OS processes). We have used antsMultivariateTemplateConstruction2.sh script previously on this exact system (which runs sge), with similar type of datasets, while utilizing all cores/threads on the system for a single registration. The reason we do that is because our microCT datasets are one or two order of magnitude larger than typical medical volumes. Thus, if you run it on 1-2 cores, it takes days to finish a single dataset (example above took 4 days, or 92h to be precise). Plus due to large memory consumption of these datasets at later stages of registration, if you go down few cores, but multiple simultaneous tasks per node, you are bound to run of out memory on the node, crashing all your tasks.

So far, one registration task per node that utilizes all threads paradigm worked well for us, until this particular dataset.

SlicerMorph commented 3 years ago

One more reflection:

For the jobs that do indeed complete using the all threads, this registration, takes about 4h (volume sizes are quite similar). Compared to the single threaded version, speed up is about 23 X, which is almost the number of physical cores. I will try either disabling the hyperthreading or just running as many threads as there are cores.

SlicerMorph commented 3 years ago

I wasn't able to debug any further, but the two level registration is the cause of the problem. This memory not mapped does not happen at all, regardless of the number of threads of resource. So, instead of doing:

 affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform)
  syn1 = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "SyN", initialTransform = affine$fwdtransforms)

  #at this point resultant registration seems reasonable:

  new.labels = antsApplyTransforms(fixed=tmp.img, moving = ref.label, 
                            transformlist = syn1$invtransforms, 
                            interpolator = "genericLabel")

  plot(tmp.img, new.labels, nslices=20, alpha=0.8)

  #crashes here                          
  syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose=TRUE)

I go straight to syn2 from affine, crash doesn't seem to happen.

 affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform)
  syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = affine$fwdtransforms, verbose=TRUE)

We can live with this, but our test data showed that we got better results in terms of label overlap, and landmark localization with the two step approach. So, if anyone else have other ideas, would be happy to hear