ANTsX / ANTsPy

A fast medical imaging analysis library in Python with algorithms for registration, segmentation, and more.
https://antspyx.readthedocs.io
Apache License 2.0
586 stars 161 forks source link

ANTsPy crashes Python when registering two images #678

Open JescoS opened 1 week ago

JescoS commented 1 week ago

Describe the bug

Python crashes without any error when calling ants.registration to rigidly register one MRI image to another. This only occurs with two specific images in my dataset.

To reproduce

import ants

fixed = ants.image_read(".../t1.nii.gz")
moving = ants.image_read(".../t1_gd.nii.gz")

# It crashes here when registering 't1_gd' (moving image) to 't1' (fixed image)
reg = ants.registration(
    fixed,
    moving,
    "Rigid",
    verbose=True,
)

Expected behavior

The registration to happen successfully, or at least that a helpful error message is provided.

Screenshots

Output from the code provided under To reproduce ``` antsRegistration -d 3 -r [0x55be9ed2c2d0,0x55be9ed36050,1] -m mattes[0x55be9ed2c2d0,0x55be9ed36050,1,32,regular,0.2] -t Rigid[0.25] -c 2100x1200x1200x10 -s 3x2x1x0 -f 6x4x2x1 -u 1 -z 1 -o [/tmp/tmpevgyaz8y,0x55be9ecbb2c0,0x55be9f006d20] -x [NA,NA] --float 1 --write-composite-transform 0 -v 1 All_Command_lines_OK Using single precision for computations. ============================================================================= The composite transform comprises the following transforms (in order): 1. Center of mass alignment using fixed image: 0x55be9ed2c2d0 and moving image: 0x55be9ed36050 (type = Euler3DTransform) ============================================================================= Reading mask(s). Registration stage 0 No fixed mask No moving mask number of levels = 4 fixed image: 0x55be9ed2c2d0 moving image: 0x55be9ed36050 Dimension = 3 Number of stages = 1 Use histogram matching = true Winsorize image intensities = false Lower quantile = 0 Upper quantile = 1 Stage 1 State Image metric = MattesMI Fixed image = Image (0x55be9f0b6ff0) RTTI typeinfo: itk::Image Reference Count: 2 Modified Time: 917 Debug: Off Object Name: Observers: none Source: (none) Source output name: (none) Release Data: Off Data Released: False Global Release Data: Off PipelineMTime: 0 UpdateMTime: 889 RealTimeStamp: 0 seconds LargestPossibleRegion: Dimension: 3 Index: [0, 0, 0] Size: [1024, 1024, 175] BufferedRegion: Dimension: 3 Index: [0, 0, 0] Size: [1024, 1024, 175] RequestedRegion: Dimension: 3 Index: [0, 0, 0] Size: [1024, 1024, 175] Spacing: [0.234375, 0.234375, 1] Origin: [117.71, 77.0839, 24.583] Direction: -0.999791 -0.0203991 -0.000987429 0.0193326 -0.929714 -0.367774 0.00658424 -0.367716 0.929915 IndexToPointMatrix: -0.234326 -0.00478105 -0.000987429 0.00453107 -0.217902 -0.367774 0.00154318 -0.0861835 0.929915 PointToIndexMatrix: -4.26578 0.0824856 0.0280928 -0.0870363 -3.96678 -1.56892 -0.000987429 -0.367774 0.929915 Inverse Direction: -0.999791 0.0193326 0.00658424 -0.0203991 -0.929714 -0.367716 -0.000987429 -0.367774 0.929915 PixelContainer: ImportImageContainer (0x55be88baaf30) RTTI typeinfo: itk::ImportImageContainer Reference Count: 1 Modified Time: 887 Debug: Off Object Name: Observers: none Pointer: 0x7f3e5a0ac010 Container manages memory: true Size: 183500800 Capacity: 183500800 Moving image = Image (0x55be9f0b9590) RTTI typeinfo: itk::Image Reference Count: 2 Modified Time: 918 Debug: Off Object Name: Observers: none Source: (none) Source output name: (none) Release Data: Off Data Released: False Global Release Data: Off PipelineMTime: 0 UpdateMTime: 915 RealTimeStamp: 0 seconds LargestPossibleRegion: Dimension: 3 Index: [0, 0, 0] Size: [1024, 1024, 55] BufferedRegion: Dimension: 3 Index: [0, 0, 0] Size: [1024, 1024, 55] RequestedRegion: Dimension: 3 Index: [0, 0, 0] Size: [1024, 1024, 55] Spacing: [0.234375, 0.234375, 3] Origin: [119.193, 75.9313, 31.5048] Direction: -0.999759 -0.0203991 -0.00813477 0.0219612 -0.929714 -0.367626 -6.3751e-05 -0.367716 0.929938 IndexToPointMatrix: -0.234318 -0.00478105 -0.0244043 0.00514716 -0.217902 -1.10288 -1.49416e-05 -0.0861835 2.78981 PointToIndexMatrix: -4.26564 0.0937012 -0.000272009 -0.0870363 -3.96678 -1.56892 -0.00271159 -0.122542 0.309979 Inverse Direction: -0.999759 0.0219612 -6.37521e-05 -0.0203991 -0.929714 -0.367716 -0.00813477 -0.367626 0.929938 PixelContainer: ImportImageContainer (0x55be9f0b9820) RTTI typeinfo: itk::ImportImageContainer Reference Count: 1 Modified Time: 913 Debug: Off Object Name: Observers: none Pointer: 0x7f3e0e3ff010 Container manages memory: true Size: 57671680 Capacity: 57671680 Weighting = 1 Sampling strategy = regular Number of bins = 32 Radius = 4 Sampling percentage = 0.2 Transform = Rigid Gradient step = 0.25 Update field sigma (voxel space) = 0 Total field sigma (voxel space) = 0 Update field time sigma = 0 Total field time sigma = 0 Number of time indices = 0 Number of time point samples = 0 Registration using 1 total stages. Stage 0 iterations = 2100x1200x1200x10 convergence threshold = 1e-06 convergence window size = 10 number of levels = 4 using the Mattes MI metric (number of bins = 32, weight = 1, use gradient filter = 0) preprocessing: histogram matching the images Shrink factors (level 1 out of 4): [6, 6, 1] Shrink factors (level 2 out of 4): [4, 4, 1] Shrink factors (level 3 out of 4): [2, 2, 1] Shrink factors (level 4 out of 4): [1, 1, 1] smoothing sigmas per level: [3, 2, 1, 0] regular sampling (percentage = 0.2) *** Running Euler3DTransform registration *** DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST 2DIAGNOSTIC, 1, -7.784408330917e-01, inf, 3.3159e+00, 3.3159e+00, 2DIAGNOSTIC, 2, -7.880971431732e-01, inf, 3.6818e+00, 3.6594e-01, 2DIAGNOSTIC, 3, -7.961868643761e-01, inf, 4.0615e+00, 3.7965e-01, 2DIAGNOSTIC, 4, -7.995039820671e-01, inf, 4.4379e+00, 3.7644e-01, 2DIAGNOSTIC, 5, -8.014418482780e-01, inf, 4.8091e+00, 3.7121e-01, 2DIAGNOSTIC, 6, -8.084483146667e-01, inf, 5.2293e+00, 4.2015e-01, 2DIAGNOSTIC, 7, -8.104047179222e-01, inf, 5.7435e+00, 5.1423e-01, 2DIAGNOSTIC, 8, -8.110386133194e-01, inf, 6.1126e+00, 3.6908e-01, 2DIAGNOSTIC, 9, -8.157773613930e-01, inf, 6.5345e+00, 4.2192e-01, 2DIAGNOSTIC, 10, -8.202537894249e-01, 2.929246518761e-03, 6.9102e+00, 3.7568e-01, 2DIAGNOSTIC, 11, -8.206747770309e-01, 2.245417330414e-03, 7.3237e+00, 4.1351e-01, 2DIAGNOSTIC, 12, -8.211225867271e-01, 1.770743401721e-03, 7.8365e+00, 5.1281e-01, 2DIAGNOSTIC, 13, -8.212584853172e-01, 1.435689860955e-03, 8.2116e+00, 3.7508e-01, 2DIAGNOSTIC, 14, -8.218446373940e-01, 1.116505824029e-03, 8.6343e+00, 4.2277e-01, 2DIAGNOSTIC, 15, -8.223311305046e-01, 7.893703295849e-04, 9.1028e+00, 4.6844e-01, 2DIAGNOSTIC, 16, -8.223287463188e-01, 5.852135363966e-04, 9.5163e+00, 4.1355e-01, 2DIAGNOSTIC, 17, -8.223301768303e-01, 3.945839998778e-04, 9.8815e+00, 3.6512e-01, 2DIAGNOSTIC, 18, -8.223319649696e-01, 2.001180691877e-04, 1.0310e+01, 4.2856e-01, 2DIAGNOSTIC, 19, -8.223304152489e-01, 9.293391485699e-05, 1.0684e+01, 3.7397e-01, 2DIAGNOSTIC, 20, -8.223320841789e-01, 6.619643681915e-05, 1.1105e+01, 4.2054e-01, 2DIAGNOSTIC, 21, -8.223316073418e-01, 4.296490442357e-05, 1.1524e+01, 4.1926e-01, 2DIAGNOSTIC, 22, -8.223342299461e-01, 2.533297993068e-05, 1.1953e+01, 4.2919e-01, 2DIAGNOSTIC, 23, -8.223354816437e-01, 8.932824130170e-06, 1.2375e+01, 4.2202e-01, 2DIAGNOSTIC, 24, -8.223351836205e-01, 1.526392907181e-06, 1.2798e+01, 4.2326e-01, 2DIAGNOSTIC, 25, -8.223311305046e-01, 1.437342575628e-06, 1.3218e+01, 4.1943e-01, 2DIAGNOSTIC, 26, -8.223313093185e-01, 1.312344238613e-06, 1.3843e+01, 6.2565e-01, 2DIAGNOSTIC, 27, -8.223315477371e-01, 1.209031893268e-06, 1.4270e+01, 4.2643e-01, 2DIAGNOSTIC, 28, -8.223308920860e-01, 1.124806317421e-06, 1.4794e+01, 5.2415e-01, 2DIAGNOSTIC, 29, -8.223308324814e-01, 1.029444206324e-06, 1.5218e+01, 4.2359e-01, DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST 2DIAGNOSTIC, 1, -7.919459939003e-01, inf, 2.1032e+01, 5.8147e+00, 2DIAGNOSTIC, 2, -7.922274470329e-01, inf, 2.1988e+01, 9.5568e-01, 2DIAGNOSTIC, 3, -7.923026680946e-01, inf, 2.3133e+01, 1.1451e+00, 2DIAGNOSTIC, 4, -7.924258708954e-01, inf, 2.4152e+01, 1.0188e+00, 2DIAGNOSTIC, 5, -7.924597263336e-01, inf, 2.5251e+01, 1.0996e+00, 2DIAGNOSTIC, 6, -7.925294637680e-01, inf, 2.6502e+01, 1.2502e+00, 2DIAGNOSTIC, 7, -7.925676107407e-01, inf, 2.7294e+01, 7.9280e-01, 2DIAGNOSTIC, 8, -7.925767302513e-01, inf, 2.8252e+01, 9.5760e-01, 2DIAGNOSTIC, 9, -7.925794720650e-01, inf, 2.9275e+01, 1.0227e+00, 2DIAGNOSTIC, 10, -7.925798296928e-01, 4.663267100113e-05, 3.0165e+01, 8.9031e-01, 2DIAGNOSTIC, 11, -7.925813794136e-01, 2.782916271826e-05, 3.1149e+01, 9.8371e-01, 2DIAGNOSTIC, 12, -7.925839424133e-01, 1.838119169406e-05, 3.2076e+01, 9.2773e-01, 2DIAGNOSTIC, 13, -7.925873398781e-01, 1.107098978537e-05, 3.3009e+01, 9.3283e-01, 2DIAGNOSTIC, 14, -7.925903797150e-01, 7.265426120284e-06, 3.3815e+01, 8.0576e-01, 2DIAGNOSTIC, 15, -7.925956845284e-01, 4.413903752720e-06, 3.4765e+01, 9.5040e-01, 2DIAGNOSTIC, 16, -7.925977110863e-01, 3.293936742921e-06, 3.5674e+01, 9.0855e-01, 2DIAGNOSTIC, 17, -7.925973534584e-01, 2.977928033943e-06, 3.6440e+01, 7.6642e-01, 2DIAGNOSTIC, 18, -7.925977706909e-01, 2.784697244351e-06, 3.7330e+01, 8.8950e-01, 2DIAGNOSTIC, 19, -7.925984859467e-01, 2.582474280644e-06, 3.8237e+01, 9.0673e-01, 2DIAGNOSTIC, 20, -7.926011681557e-01, 2.369023832216e-06, 3.9432e+01, 1.1949e+00, 2DIAGNOSTIC, 21, -7.926015257835e-01, 2.134311671398e-06, 4.0366e+01, 9.3464e-01, 2DIAGNOSTIC, 22, -7.926001548767e-01, 1.883459503915e-06, 4.1423e+01, 1.0564e+00, 2DIAGNOSTIC, 23, -7.925997972488e-01, 1.662279260017e-06, 4.2299e+01, 8.7658e-01, 2DIAGNOSTIC, 24, -7.925995588303e-01, 1.472873009334e-06, 4.3217e+01, 9.1793e-01, 2DIAGNOSTIC, 25, -7.926007509232e-01, 1.376191448799e-06, 4.4349e+01, 1.1316e+00, 2DIAGNOSTIC, 26, -7.926012277603e-01, 1.311355731559e-06, 4.5566e+01, 1.2175e+00, 2DIAGNOSTIC, 27, -7.926009297371e-01, 1.235148147316e-06, 4.7030e+01, 1.4635e+00, 2DIAGNOSTIC, 28, -7.926003932953e-01, 1.158303689408e-06, 4.8101e+01, 1.0711e+00, 2DIAGNOSTIC, 29, -7.926004528999e-01, 1.093328023671e-06, 4.9094e+01, 9.9350e-01, 2DIAGNOSTIC, 30, -7.926002740860e-01, 1.062158162313e-06, 5.0157e+01, 1.0629e+00, 2DIAGNOSTIC, 31, -7.926017642021e-01, 1.054402559930e-06, 5.1191e+01, 1.0335e+00, 2DIAGNOSTIC, 32, -7.926021218300e-01, 1.032973159454e-06, 5.2104e+01, 9.1302e-01, 2DIAGNOSTIC, 33, -7.926020622253e-01, 1.005100102702e-06, 5.3521e+01, 1.4168e+00, Killed ```

ANTsPy installation (please complete the following information):

Additional context The registration only seems to fail for a particular combination of images in my dataset, so I cannot reproduce the error with any publicly available data, unfortunately. It also seems to fail after a different number of iterations every time I run the provided code.

cookpa commented 1 week ago

Out of memory? Try increasing RAM available to docker?

cookpa commented 1 week ago

Are the other images that work successfully as large as these? 1024x1024x175 is quite large.

It also seems to fail after a different number of iterations every time I run the provided code.

I suspect it is converging at different numbers of iterations, then running out of memory. Before each stage, it will print

DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST

If this is consistently printed twice, it's probably that it's converging differently due to random sampling and then trying and failing to start the next level.

Jose-Verdu-Diaz commented 6 days ago

I'm having a similar memory issue when registering multiple batches of images. Memory usage seems to increase over time, suggesting that memory is not cleared properly.

cookpa commented 1 day ago

I can see a small increase in the number of objects and the RSS memory over time, when running registration in a loop. Just registering a normal T1w image to MNI, repeatedly

import ants
import gc
import psutil
import sys
import time

t1w = ants.image_read('t1w.nii.gz')
template = ants.image_read('tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz')

process = psutil.Process()

# Function to convert bytes to MB
def bytes_to_mb(bytes):
    return round(bytes / 1024 / 1024, 3)

def get_object_description(obj):
    """Generate a description of the object."""
    if isinstance(obj, (list, set, tuple)):
        return f"{type(obj).__name__} of size {len(obj)}: {repr(obj)[:200]}"
    elif isinstance(obj, dict):
        return f"{type(obj).__name__} with {len(obj)} keys: {repr(obj)[:200]}"
    elif isinstance(obj, str):
        return f"String of length {len(obj)}: {repr(obj)[:200]}"
    else:
        return repr(obj)[:200]

def show_largest_objects(limit=10):
    """Print the largest objects in memory, in MB."""
    gc.collect()
    objects = gc.get_objects()
    print(f"Total objects in memory: {len(objects)}")
    sorted_objects = sorted(objects, key=lambda x: sys.getsizeof(x), reverse=True)[:limit]
    for obj in sorted_objects:
        size_in_mb = bytes_to_mb(sys.getsizeof(obj))
        description = get_object_description(obj)
        print(f"{description} - {size_in_mb} MB")

total_memory = list()
total_objects = list()

reg = None

try:
    for i in range(100):
        gc.collect()
        memory_info = process.memory_info()
        total_memory.append(bytes_to_mb(memory_info.rss))
        total_objects.append(len(gc.get_objects()))
        print(f"RSS Memory: {bytes_to_mb(memory_info.rss):.2f} MB")
        print(f"VMS Memory: {bytes_to_mb(memory_info.vms):.2f} MB")
        # show_largest_objects(5)
        reg = ants.registration(template, t1w, aff_iterations=(25, 0, 0, 0), reg_iterations=(1, 0, 0, 0) )
        print('Done iteration ' + str(i))
except KeyboardInterrupt:
    print("Monitoring stopped.")

print("Final stats")
gc.collect()
memory_info = process.memory_info()
print(f"RSS Memory: {bytes_to_mb(memory_info.rss):.2f} MB")
show_largest_objects(10)

I ran this for 56 iterations before interrupting

image
> memory_values
 [1]  309.633 1901.070 1990.355 1991.445 1993.828 2002.277 2009.391 2009.805 2010.035 2010.977 2011.438 2017.438 2021.508 2022.109 2022.117 2022.137 2022.352 2022.359 2024.039 2024.055 2024.062
[22] 2028.086 2028.105 2028.117 2028.297 2028.297 2030.715 2030.734 2030.738 2030.766 2030.766 2030.773 2030.777 2030.777 2030.777 2030.785 2030.785 2030.797 2030.801 2030.820 2030.836 2030.871
[43] 2030.871 2030.871 2030.875 2030.875 2030.992 2037.000 2037.023 2037.023 2043.039 2043.039 2043.082 2043.082 2043.082 2043.113 2043.121 2043.121 2043.121 2043.152 2043.328
> diff(memory_values)
 [1] 1591.437   89.285    1.090    2.383    8.449    7.114    0.414    0.230    0.942    0.461    6.000    4.070    0.601    0.008    0.020    0.215    0.007    1.680    0.016    0.007    4.024
[22]    0.019    0.012    0.180    0.000    2.418    0.019    0.004    0.028    0.000    0.007    0.004    0.000    0.000    0.008    0.000    0.012    0.004    0.019    0.016    0.035    0.000
[43]    0.000    0.004    0.000    0.117    6.008    0.023    0.000    6.016    0.000    0.043    0.000    0.000    0.031    0.008    0.000    0.000    0.031    0.176
> 

Because the difference is small (after the first time) and variable, I'm not sure this is actually a leak on the ANTsPy / ITK side.