CARTAvis / fits2idia

C++ implementation of FITS to IDIA-HDF5 converter, optimised using OpenMP
GNU General Public License v3.0
2 stars 1 forks source link

Under-estimated memory prediction #42

Open Jordatious opened 3 years ago

Jordatious commented 3 years ago

When running the converter in memory prediction mode (-m), it was suggested I use ~136 GB (see below), which is for a spectral-line cube of shape (1, 1010, 4096, 4096). However, I allocated 150 GB (see below) and the job crashed with an OOM error. I then ran it using 232 GB and the SLURM reported 190.27 GB MaxRSS, ~40% higher than predicted.

jcollier@slurm-login:~$ /carta_share/hdf_convert/run_hdf_converter -m blah.fits
APPROXIMATE MEMORY REQUIREMENTS:
Z stats:    0.536871 GB
XYZ stats:  0.0331285 GB
Rotation:   67.78 GB
XY stats:   0.033128 GB
Main dataset:   67.78 GB
Mipmaps:    67.7138 GB
TOTAL:  136.163GB (Rotated dataset and mipmaps are not allocated at the same time.)
srun --mem=150GB --time=30 --cpus-per-task=30 /carta_share/hdf_convert/run_hdf_converter -p -o /carta_share/current/users/jcollier/blah.hdf5 blah.fits
veggiesaurus commented 3 years ago

I wonder if there's a memory leak in there. @confluence perhaps we should check with valgrind?

confluence commented 3 years ago

@Jordatious have you run into similar issues in the past? I recently updated the converter executable on Ilifu, and I wonder if the previous executable behaves in the same way.

confluence commented 3 years ago

In semi-related news, I have an idea for building the converter executable in a way which eliminates the need for the wrapper, which would make it easier to answer questions like that without having to copy and edit the wrapper file.

Jordatious commented 3 years ago

@Jordatious have you run into similar issues in the past? I recently updated the converter executable on Ilifu, and I wonder if the previous executable behaves in the same way.

No I haven't, but I also haven't done many runs where I first predict the memory footprint and then allocate something a little above that. I have a few times, mostly for cubes, and not seen this. But I don't know how much we can rely on those tests.

Jordatious commented 6 months ago

Hi @confluence, just a quick note that this is still an issue for the latest version. See example below (554 GB used compared to 398 GB predicted):

jcollier@setonix-01:~/scratch/SB60320/POSSUM> sacct -j 11607819 -o JobName,ReqMem,MaxRSS --unit=GB
   JobName     ReqMem     MaxRSS
---------- ---------- ----------
interacti+       987G
interacti+               553.95G
    extern                     0
jcollier@setonix-01:~/scratch/SB60320/POSSUM> fits2idia -m ~/scratch/SB60320/POSSUM/image.restored.i.EMU_1127+00A.SB51572.contcube.conv.transposed.fits
APPROXIMATE MEMORY REQUIREMENTS:
Z stats:    5.44433 GB
XYZ stats:  0.0301554 GB
Rotation:   195.996 GB
XY stats:   0.0300603 GB
Mipmaps:    196.011 GB
Main dataset:   195.996 GB
TOTAL:  397.511GB (Rotated dataset and mipmaps are not allocated at the same time.)