LSSTDESC / DC2-production

Configuration, production, validation specifications and tools for the DC2 Data Set.
BSD 3-Clause "New" or "Revised" License
11 stars 7 forks source link

Improve write_gcr_to_parquet.py #396

Closed yymao closed 3 years ago

yymao commented 4 years ago

(Updated 6/25/2020): This PR adds several improvements to write_gcr_to_parquet.py.

yymao commented 4 years ago

I have tested this version of write_gcr_to_parquet.py on a NERSC cori login node. I was able to convert dc2_object_run2.2i_dr3 to a single parquet files with all tracts at once in about 30 min, with a peak memory usage around 30 GB.