As a design principle, Arrakis produces many files. This is actually somewhat desirable since it allows us to process the data in parallel. However, for our full runs at processing the entire sky we'll need to make some considerations. Many HPC filesystems don't like having millions of files, and enforce inode limits on users.
There are some options for us to consider, and we'll need to do some testing and brainstorming. One painful constraint is that many of the tools used in the pipeline (e.g. linmos) are built to handle FITS files and FITS files only. Further, converting in/out of other file formats will chew up time considerably.
To be explicit, we need to consider some kind of combined file format for the cubelets.
As a design principle, Arrakis produces many files. This is actually somewhat desirable since it allows us to process the data in parallel. However, for our full runs at processing the entire sky we'll need to make some considerations. Many HPC filesystems don't like having millions of files, and enforce inode limits on users.
There are some options for us to consider, and we'll need to do some testing and brainstorming. One painful constraint is that many of the tools used in the pipeline (e.g.
linmos
) are built to handle FITS files and FITS files only. Further, converting in/out of other file formats will chew up time considerably.To be explicit, we need to consider some kind of combined file format for the cubelets.
Some options off the top of my head: