Closed yymao closed 3 years ago
I have tested this version of write_gcr_to_parquet.py
on a NERSC cori login node. I was able to convert dc2_object_run2.2i_dr3
to a single parquet files with all tracts at once in about 30 min, with a peak memory usage around 30 GB.
(Updated 6/25/2020): This PR adds several improvements to
write_gcr_to_parquet.py
.write_gcr_to_parquet.py
by callingclose_all_file_handles
(if available in the reader) and delete unused referencestract
andpatch
if available for "tract catalogs"partition
option to save each partition as a single file.