damonge / CoLoRe

CoLoRe - Cosmological Lofty Realizations
GNU General Public License v3.0
17 stars 13 forks source link

Add option write_skewers_Npix to write one skewers file per HEALPix pixel #24

Open andreufont opened 6 years ago

andreufont commented 6 years ago

Right now, each node writes a fraction of the skewers to a file called something like: skewers_srcs_s0_0.fits where _0 specifies the node.

The current setting might end up with really large files, specially when running on only a few nodes.

Moreover, most codes that will use these files work with HEALPix pixel based files, so it would be great if CoLoRe could already print a file per HEALPix pixel. Note that this does not need to be the same HEALPix grid used internally in CoLoRe, but it could use that as default.

The new files could be named something like: skewers_srcs_s0_pix37.fits where _pix37 now specifies the HEALPix pixel contained in the file. Or something like this. Of course, write_skewers_Npix would be added as meta data to one of the ouput files so that we can map back a given pixel to a sky position.

Important question: In the current code, is a HEALPix pixel always contained in a single node-based output? If not, would it be difficult to assign these output HEALPix pixels to nodes, and let each node write the files corresponding to its pixels? The alternative would be having two or more nodes writing to the same pixel file, what I guess it is not ideal.

A reasonable default value for write_skewers_Npix could be either 8 or 16. With write_skewers_Npix=8, each file would have ~ 5000 quasars, resulting in of sizes of 400 Mb (?).

damonge commented 6 years ago

This can be done. Note that right now each node will contain all skewers in big healpix pixels (N_side=1, 2 or higher, depending on the number of nodes you run with), with no overlap between nodes. So one could make the code spit out sub-pixels of those big pixels individually.

andreufont commented 6 years ago

That's great. I noticed now that when using 32 nodes, I get 16 files of size 9Gb and 16 files of size 4.5Gb, what probably means that the printed files are using Nside=2, i.e., 48 pixels, and half of the nodes are writing 2 pixels to the file. It does look like an easy change to implement.

andreufont commented 6 years ago

Actually, we have a very simple post-process script that allows us to split the files into smaller files with a different value of Nside, so this is not a priority at all.

andreufont commented 6 years ago

As pointed out in issue #27, current files might not correspond to HEALPIx pixels, so it would be good to implement this.

damonge commented 6 years ago

@andreufont : see the new branch output_perpix. The catalogs are now split into individual healpix pixels (although the user has no control over the Nside). The pixel indices are in NEST order (I should document this).

andreufont commented 6 years ago

This is great! @jfarr03, it would be great if you could give it a try!