LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

add healpix option in write_gcr_to_parquet.py (for converting cosmoDC2 to parquet) #405

Closed yymao closed 3 years ago

yymao commented 3 years ago

This PR adds an --healpix option in scripts/write_gcr_to_parquet.py so that it is more convenient to use this script to convert cosmoDC2 to parquet. The option can be used as:

python ~/desc/DC2-production/scripts/write_gcr_to_parquet.py cosmoDC2_v1.1.4_image --healpix=8786

As a test, I am generating a few healpix pixels in /global/cscratch1/sd/yymao/desc/cosmodc2-parquet on NERSC if anyone wants to take a look. (cc @JoanneBogart @plaszczy @cwwalter)

yymao commented 3 years ago

On cori login node with SCRATCH file system, converting on one healpix of cosmoDC2 takes about 11 minutes, resulting in a 25 GB parquet file (per healpix, not including native quantities).

yymao commented 3 years ago

Note of self: need to prevent duplicated columns when all column names are in lower case.

yymao commented 3 years ago

Thanks @JoanneBogart (and also thanks to @plaszczy and @cwwalter for checking the output files off github).