Closed wmwv closed 5 years ago
This can be run either matching to the DPDD Object Table (if passed a --reader
) or individually reading and matching against the coadd merged detection reference catalogs via the butler (if no --reader
is passed). The resulting Object IDs are the same (and I have verified that they are the same).
@jiwoncpark Thanks for the review! Minor comments resolved. Small function added to ensure visit list is unique.
Could you be kind enough to take another look?
@yymao Can you formally approve this (in your role as blessed approver) so that I can merge this. @jiwoncpark is happy with it.
Source Catalog files now generated and installed at NERSC using new
merge_source_cat.py
script and associated SLURM job execution. 129,512,365 Source Ids.Produces one Parquet file per visit. Each file is 10-20 MB, with the entire set of 1,995 visits totaling 25 GB. Reading these all in is slow and takes 5-15 minutes, with the variance likely due to load or memory pressure on the JupypterLab node.
Files were processed with a 8-node Taskfarmer SLURM job. The job took 4 hours to run (preceeded 36 hours waiting in the queue)
There is an updated
scripts/README.md
that details what was done to produce these.There is a
Notebook/verify_source_table.ipynb
to test simple properties.There is a reader in the
issues/274
branch ofgcr-catalogs. The above Notebook shows how to use it if you've checked out a local copy of
gcr-catalogs`.Future work should work on performance as optimized for certain use cases. The current performance will not scale to Run 2.1.