Open araichoor opened 3 months ago
from quickly looking at the code:
tilepix-{tileid}.json
file in the preproc/
folder; this file maps which pixels should be considered for each petal:
cat /pscratch/sd/r/raichoor/daily/preproc/20240309/00229541/tilepix-83444.json
{"83444": {"0": [27239, 27245, 27256, 27258], "1": [27244, 27245, 27246, 27247, 27256, 27258], "2": [27246, 27247, 27258, 27333], "3": [27247, 27258, 27333, 27344], "4": [27258, 27259, 27344, 27345], "5": [27258, 27259, 27345, 27348], "6": [27258, 27259, 27262, 27348], "7": [27256, 27257, 27258, 27259, 27260, 27262], "8": [27250, 27251, 27256, 27257, 27258], "9": [27239, 27250, 27256, 27258]}}
desispec.pixgroup.get_exp2healpix_map()
function: could we add in that function a nside=64
optional argument, with the requirement that nside >= 64
? that way I could just call that function with nside=128
, and hopefully it should work? (as the smaller pixels will be enclosed in the nside=64 ones, so the pixel<->petal mapping will still be valid).for info, as I ll be working on this, I ve moved:
/pscratch/sd/r/raichoor/daily/healpix/tertiary38-thru20240313
to:
/pscratch/sd/r/raichoor/daily/healpix/tertiary38-thru20240313-v0
I ve edited the example call to rrdesi_mpi
in the original message.
@araichoor have you tried running it single-threaded, with no parallelism? It'll need more time, but in that case you'll have access to all the memory on the (CPU or GPU) node and won't hit the 2GB pickling limit.
"have you tried running it single-threaded, with no parallelism? " => no, I didn t.
I m happy to test; but could you provide the correct srun
invocation?
for what is worth: this night I ve run with using nside=128 (instead of nside=64), and it worked; the largest number of spectra I ve then per pixel is 5901.
I propose we close this ticket as resolved. The 2GB pickling limit is baked into both multiprocessing and MPI, so reducing the memory footprint of the input data (as @araichoor has done) is the way to go.
@araichoor to run "single-threaded" just don't use any parallelism, e.g.,
srun -N 1 -n 1 -c 2 -t 01:00:00 rrdesi_mpi -i ...
but that will be a lot slower, of course (and the node may still run out of memory depending on how many input spectra there are). Using larger-nside healpixels seems like a more reasonable choice, I think, for this specific problem.
thanks.
for what I was working on, yes, using a larger-nside healpixel worked (nside=128 instead of the "default" nside=64).
however, for e.g. jura, I guess we will want to keep the same healpixel size for all cross-tile coadds. so, for what is worth, I gave a try in an interactive session (4h) with nside=64:
srun -N 1 -n 1 -c 2 -t 04:00:00 rrdesi_mpi -i ...
it looks to me it should take ~9h to run the pixel with the most numerous rows: is it problematic or not for a production? (maybe not if gpus are used, sorry I m ignorant here).
here are the number of row per file (nside=64):
# HPXPIXEL NROW
27238 16
27246 876
27244 1945
27239 4037
27251 3178
27250 8349
27245 13016
27247 12501
27256 12034
27260 1669
27257 12157
27258 11457
27262 3772
27333 3205
27259 12983
27348 337
27345 8049
27344 10410
I get the following timings for the four pixels that ran in my 4h session:
# HPXPIXEL NROW SECONDS
27238 16 46.4
27246 876 2160.3
27244 1945 4764.6
27239 4037 9942.7
it roughly scales as SECONDS = 2.5 * NROW
.
this scaling gives ~9h for HPXPIXEL=27245
which as 13k rows.
I ve been running the cross-tiles redux for the elg tertiary38 tiles, and I get an error in redrock for half (9/18) of the coadds.
one “specificity” of that rosette is that we ve 25 tiles (as for other tertiary programs), but we requested only one obs. per target, so we ve a lot more than usual individual targetids; I d say e.g. a twice denser sample than in sv3 rosettes which typically had ~13 tiles.
the error likely comes from the fact that those 9 coadds have too many rows:
comment from @sbailey: "mpi4py has a limit of 2 GB for the largest thing it can pass between processes due to a limitation of the underlying python pickle module. I don't know what the exact threshold is in terms of number of targets, but basically your input files are too big, and need to move to smaller healpix with fewer targets, or otherwise to more nodes with more MPI ranks so that there is less data per rank."
example of one of my calls from an interactive node (note: command edited from orig. message with an
-v0
):and the reported error: