Closed yymao closed 5 years ago
OK, this is now working and ready for review! Here's an example of how to run several patches:
python generate_object_catalog.py <REPO> 4850 --patches=00,01,02,03 --output_dir=<OUTPUT_DIR>
Here's a real case on NERSC:
python generate_object_catalog.py /global/cscratch1/sd/desc/DC2/data/Run2.1i/rerun/coadd-v1 4850 --patches=00,01 --output_dir=$SCRATCH --verbose
cc @johannct
I'd like to call this script make_object_catalog.py
?
Shorter with same sense.
Sorry @yymao to have been unclear when we chatted. The standard patch syntax is X,Y and if there are several patches, then using the stack nomenclature may be in order : in your example above 0,0^0,1^0,2^0,3 rather than 00,01,02,03
If this goes all the way to parquet files, is it ok to run it several times with the same tract but disjoint sets of patches in order to build the full catalogue?
[tanugi@cca001 scripts]$ git diff make_object_catalog.py diff --git a/scripts/make_object_catalog.py b/scripts/make_object_catalog.py index 6cb5c1c..edb02f3 100644 --- a/scripts/make_object_catalog.py +++ b/scripts/make_object_catalog.py @@ -57,13 +57,12 @@ def generate_object_catalog(output_dir, butler, tract, patches=None, patches = ['%d,%d' % patch.getIndex() for patch in skymap[tract]] else: try: - patches = patches.split(',') + patches = patches.split('^') except AttributeError: pass else: - if not all(len(p) == 2 for p in patches): + if not all(len(p) == 3 for p in patches): raise ValueError('patches should be a list or a string in "11,22,33" format') - patches = ['{},{}'.format(*p) for p in patches] \ for patch in patches: if verbose:
Thanks @johannct, I've updated the format for specifying patches
as suggested.
You asked:
If this goes all the way to parquet files, is it ok to run it several times with the same tract but disjoint sets of patches in order to build the full catalogue? Yes, each patch will have its own output, so you can run disjoint sets of patches in parallel. Once everything is done, we need to run another script to join the patches in each tract.
ok then for now I can also run on tracts only, that will do away with the second script. Is there another step? I tested the script successfully by the way.
Right, so there will be two steps:
First, generate per-patch files:
REPO=/path/to/butler/repo
TRACT=4850
OUTPUTDIR=/path/to/output_dir
python make_object_catalog.py $REPO $TRACT --patches='0,0^0,1^0,2' --output-dir=$OUTPUTDIR
python make_object_catalog.py $REPO $TRACT --patches='1,0^1,1^1,2' --output-dir=$OUTPUTDIR
And after all patches in this tract is done, then run:
python merge_parquet_files.py $OUTPUTDIR/object_$TRACT_*.parquet -o=$OUTPUTDIR/object_tract_$TRACT.parquet --sort-input-files
??? Why do you have to generate per patch object catalogs ?? Michael's script were already automatically finding and extracting the object catalogs from the patches using butler magic
@EiffL because we may want to parallelize the patches
keep homogeneity with DM : --patch
instead of --patches
.
@johannct Because of Python's unique-prefix rules for processing options, you can still use --patch
instead of --patches
.
We use --patches
in merge_dia_object.py
.
We can change both to --patch
in some future PR if this ends up being confusing.
This PR adds a script to generate object catalog in parquet format. This new script is called
generate_object_catalog.py
and replaces the functionality ofmerge_tract_cat.py
.This is now a draft PR. The function
merge_coadd_forced_src
has been tested, but the CLI interface has not been tested. Also, the magnitude extraction part still needs to be fixed.This PR will fix #342.