Architecture design discussion

guynir42 commented 1 year ago

Here's a rundown of all the moving parts (as per the discussion of Rob & Guy):

The conductor sees new files appear and adds them as Exposure rows on the DB, flagged as not being processed yet. When they're processed, each row's flag will indicate we don't need to re-do them. The web-server will be aware of any created objects and display them for debugging and for figuring out if the R/B score is working properly. The conductor and web-server may be merged into one thing at some point.

The pipeline itself needs to run separately on each image or each CCD in that image (see issue #14 about multiprocessing). The pipeline will be broken up into the following phases: 1) Preprocessing / Image reduction (need to decide on a name!): this includes dark/flat and sky subtraction. 2) Calibration: extracting sources. astrometry (finding the WCS), re-extracting sources and running photometric calibration to get the ZP. 3) Subtraction: find existing reference images and create them if they don't exist yet. Use them to run subtraction on the new images. 4) Photometry: extract sources from the difference images, run analytical cuts and R/B to remove most of them, save cutouts and photometry (flux, centroids, etc) and attach the results to existing Objects in the DB (or create a new Object).

Each one of these steps/phases/processes has to successfully complete in order for any data to be saved to the DB, to local disk, and to NERSC permanent storage. When these products are saved, they are stamped with a Provenance that contains the parameters and code version used to produce them.

Parameter control: seems like the most reasonable way to do this is to create a class for each of these phases. The top-level "pipeline" object will initialize the corresponding objects. The pipeline will give the info from the config file (e.g., will find a dictionary in the config for "subtraction" and feed that as arguments to the Subtraction constructor). If you want to initialize it on a jupyter notebook you can also do that by just choosing the parameters manually.

Each processing object would query the database for the required data and proceed if the data is found. Thus, it should be simple to initialize from an interactive session, e.g., a Subtraction object and have that start running subtraction on an image/CCD.

Additional phases of the pipeline can be added, such as Stacks or StreakDetection. We can think about those later, but the idea is the same.

Please add thoughts and ideas towards this discussion...

rknop commented 1 year ago

Perhaps separate astrometry and photometry? They're fairly different steps, and probably will use different catelogs. (I believe that GAIA is "state of the art" for astrometry, but you really don't want to try to use that for photometric calibration.) Astrometry can sometimes be a bit slow, and photometry is something during development that we're likely to be fiddling with, so there may be some value in committing astrometry and then redoing photometry.

guynir42 commented 1 year ago

So astrometry, and in a separate phase photometric calibration and ZP?

rknop commented 1 year ago

Yeah. (ZP and photometric calibration are the same thing.)

guynir42 commented 1 year ago

addressed by PR #19

guynir42 commented 1 year ago

Closed by #39

c3-time-domain / SeeChange

Architecture design discussion #15