SKIRT / SKIRT9

SKIRT version 9 -- advanced radiative transfer in dusty systems
http://www.skirt.ugent.be
GNU Affero General Public License v3.0
35 stars 30 forks source link

Support binary stored columns format as a faster alternative for column text files #101

Closed petercamps closed 3 years ago

petercamps commented 3 years ago

Motivation With the ever increasing resolution of hydro simulations, column text import files representing a galaxy snapshot may easily contain 100 million lines. Reading these large column text files can be very slow (on the order of one minute per ten million lines) and cannot be accelerated through parallelization. Although the time spent reading import files is usually still small compared to the total simulation time, these long setup times are especially annoying while interactively testing or fine-tuning a SKIRT configuration.

It has been suggested, among others by @cbehren and @bwvdnbro, to support a binary format such as HDF5 for SKIRT import. However, this would bring extra complexities. Firstly, we would need to introduce a dependency on the HDF5 library (which, admittedly, could be made optional). Secondly, HDF5 is a very rich data-base-like format. We would need to define exactly how the columns to be imported and the corresponding metadata have to be stored in the file, creating a kind of SKIRT-specific HDF5 format. Feasible but not ideal.

Lastly, there is no hope to import directly from hydro snapshots stored in HDF5 because every type of simulation requires its own post-processing recipe, usually implemented in Python. At the end of the data extraction and processing performed in that step, one can just as well save the SKIRT import files in a SKIRT-specific format, as long as this is easy to do in Python.

Description This pull request introduces the option, in many common use cases, to provide import files in the binary SKIRT stored columns file format. This format is similar to the SKIRT stored table file format used for SKIRT built-in resources, but is intended to represent a set of data columns just like a column text file.

An accompanying PTS pull request is adding a function to save stored columns files, and a command to convert an existing column text file to stored columns format.

To provide a stored columns file to SKIRT, simply specify a filename with the mandatory .scol filename extension anywhere SKIRT expects a column text file (within the limitations described below). Where applicable, the useColumns attribute may need to be updated (because column names may need to differ, see below).

Limitations The stored columns file format has the following limitations:

Tests Functional tests have been added. The new format has also been tested with some large import files. Reading the stored columns format is about 30 times faster than for the regular text-based format.