SlicerDMRI / whitematteranalysis

White matter tractography clustering and more...
https://dmri.slicer.org/whitematteranalysis/
Other
69 stars 33 forks source link

wm_cluster_from_atlas.py crashes on a 128 GB RAM, Python 3.12 environment #240

Open tashrifbillah opened 2 weeks ago

tashrifbillah commented 2 weeks ago

wm_cluster_from_atlas.py crashes at this line saying Killed:

https://github.com/SlicerDMRI/whitematteranalysis/blob/f9c0ef672608095ef3ff36707053376892aeaeb7/whitematteranalysis/similarity.py#L17

We pip installed this on a 128 GB RAM, Python 3.12, Redhat 9 environment. We realized that it does not crash on a 512 GB RAM, Python 3.12, Redhat 9 environment. Feel free to share your thoughts.


Edit: we collected a few statistics when the above crash happens:

-> similarities = np.exp(-distance / (sigmasq))
(Pdb) distance.max()
24388.2132106186
(Pdb) distance.min()
0.0
(Pdb) distance.shape
(2500, 1860721)
(Pdb) np.exp(-distance / (sigmasq))
Killed

The RAM usage simply jumps to 128 GB when it crashes on a 128 GB machine. Same issue on a 256 GB RAM machine.

tashrifbillah commented 2 weeks ago

Command used:

wm_cluster_from_atlas.py \
sub-4003_ses-2_dir-416_desc-XcUnEdEp_reg.vtk \
/software/rocky9/ORG-Atlases-1.2/ORG-800FC-100HCP \
wma/sub-4003_ses-2_dir-416_desc-XcUnEdEp/FiberClustering/InitialClusters \
-l 40 -j 1
tashrifbillah commented 2 weeks ago

Upon further investigation, I realize that just this also fails:

-distance / (sigmasq)

tashrifbillah commented 1 week ago

I replaced that line with:

    np.multiply(distance, -1, out=distance, dtype=np.float32)
    np.divide(distance, sigmasq, out=distance, dtype=np.float32)

    M = distance.shape[0]
    N = distance.shape[1]
    similarities = np.zeros((M,N), dtype=np.float32)
    np.exp(distance, out=similarities, dtype=np.float32)

    del distance

Thereby, I fixed just the distance_to_similarity() function. But then it fails somewhere downstream due to memory overflow.


The idea here is that you need to provide dtype=np.float32 everywhere downstream. And wherever possible, you need to provide out= argument.

tashrifbillah commented 1 week ago

The idea here is that you need to provide dtype=np.float32 everywhere downstream. And wherever possible, you need to provide out= argument.