Best practice for book keeping?

Once again let me say that i am blown away by the power of this wonderful tool. So much fun.

I am currently working to build an image calibration pipeline, centred on astrophot modelling/photometry for stars. There are a few steps to the process: first i fit many stars separately and independent to assess their suitability for PSF modelling; then i fit an ensemble of 'good' stars with a single model to determine the PSF and to get the positions and fluxes to be used for astrometric and photometric calibration. Like i say: fun! : )

The hardest part of this with astrophot is actually tracking and extracting the results of the fits — which says a lot about how good astrophot is at what it is supposed to do! But still, it would be good to have some documentation or guidance and possibly even some extra functionality to help with record keeping.

I'm not sure that what i'm doing is particularly good or clever, but just to show the approach that i have gravitated towards:

In the first phase when i fit many stars separately and independently, here is what i do:

list_of_row_data = []
for star in stars :
    # relevant information for this source
    objid, xcen, ycen = star['objid'], star['xcen'], star['ycen']
    # create the model for this source
    star_model = astrophot.models.AstroPhot_Model(
        name=str(star['pts_key']), target=target_image,  pixelscale=1,
        model_type='point model', parameters={"center":[xcen, ycen], "flux":1.7},        
        psf=psf_model, psf_mode='full' )
    # do the fit
    star_result = astrophot.fit.LM( star_model ).fit()
    star_result.update_uncertainty()
    # now start extracting values
    star_values = star_model.parameters.vector_values().detach().cpu().numpy()
    star_covar = star_result.covariance_matrix.detach().cpu().numpy()
    star_chi2min = star_result.chi2min() 
    # assemble these into what will become a table row  - this is brittle
    row_data = np.hstack(( star['pts_key'], # identifier 
                           star_values,  # fit parameters (n, Rd, x, y, flux)
                           np.sqrt(np.diag(star_covar)), # fit uncertainties 
                           star_chi2min # min chi2
                           ))
    # append this row array to a list
    list_of_row_data.append( row_data )  
# define the column names corresponding to the row data - this is brittle
colnames = ( 'pts_key star_shape star_size star_xcenter star_ycenter star_logflux' 
             ' star_shape_unc star_size_unc star_xcenter_unc star_ycenter_unc star_logflux_unc' 
             ' star_chi2_min'
             ).split()
# then construct the final output as an astropy Table
data = Table( { col : vals for (col, vals) 
                in zip(colnames, np.vstack(list_of_row_data).T) } )

In the later phase, when i have the positions and fluxes of many stars fit with the same PSF, i have a different but equally awkward pattern:

model = astrophot.models.AstroPhot_Model(
    name='psf star models',  target=target_image,
    model_type='group model', psf_mode='full',
    models=good_star_models, # this is a list of point source models
    )
model.initialize()
result = astrophot.fit.LM( model, verbose=1 ).fit()
result.update_uncertainty()
# now begins the process of extracting results ... much fiddlier!
# start by defining the column names 
colnames = 'pts_key psf_xcenter psf_ycenter psf_xcenter_unc psf_ycenter_unc psf_logflux psf_logflux_unc'.split()
# create an empty dictionary to hold column data as a list of values
data_dict = { col : [] for col in colnames }
# step through each model in the group of models
for node in model.parameters.nodes.values() :
   # a sky component behaves differently and shouldn't be reported
    if 'sky' in node.name :
        continue
    # now extract and store each quantity carefully.
    data_dict[ 'objid' ].append( int(node.name) )
    xcen, ycen = node['center'].value.detach().cpu().numpy()
    dx, dy = node['center'].uncertainty.detach().cpu().numpy()
    data_dict[ 'psf_xcenter' ].append( xcen )
    data_dict[ 'psf_ycenter' ].append( ycen )
    data_dict[ 'psf_xcenter_unc' ].append( dx )
    data_dict[ 'psf_ycenter_unc' ].append( dy )
    flux = node['flux'].value.detach().cpu().numpy()
    dflux = node['flux'].uncertainty.detach().cpu().numpy()
    data_dict[ 'psf_logflux' ].append( flux )
    data_dict[ 'psf_logflux_unc' ].append( dflux )
# finally use this dictionary of lists to construct an astropy table of results
psf_data = Table( data_dict )

I recognise that the pattern one should use is specific to use case, and that the point of astrophot is that it is structured to be applicable to many and varied situations. So actually conceiving of a unified scheme for 'just works' dumping of data into a table is probably a fool's errand. But it might make sense to have some facility to do some common/likely use cases like the ones above?

But regardless, it would be very helpful to have a tutorial notebook that includes some examples and/or guidance for best practice ways of constructing catalogues/tables of results from astrophot.

Hi @entaylor , indeed there is a lot of bookkeeping to do! It's so cool to see built out pipelines with the code :) I think there are a bunch of tricks that I could write out in a tutorial for sure. For example for any model you can do model.parameter_order to grab a tuple with all the parameter names. Or even better, you can do model.parameters.vector_names() to get a list of strings with the same length as model.parameters.vector_values() which has all of the fitted values.

I had attempted to make a human readable automatic output at one point, this is the model.save() yaml file. Ultimately, because there can be a lot of complexity in how models share parameters, and how one can make arbitrarily nested group model, I couldn't get a structure that was a really clean table like what I think you want.

Perhaps we can use this issue to discuss a format that AstroPhot could automatically output which would be useful for pipelines like what you are constructing. I have a few ideas for what could be done, which have advantages and drawbacks.

Users could just do model.parameters.vector_names() and model.parameters.vector_values() to get a matched list of parameter names and values. But if you are fitting multiple of the same model, then the parameter names will be duplicated. So maybe I could add a function like model.parameters.vector_model_names() so you can tell which model(s) that parameter points to.
I could make something like model.save_csv() which dumps the data into a csv where each row is a model, and the columns are the model name, parameter value, parameter uncertainty. A challenge here is for parameters like center which have multiple values, but I could just do like I do for vector_names and convert the array into center:0 and center:1. Then it would be a trivial matter for a user to rename the columns x and y or RA and DEC or anything else depending on their circumstance. Another challenge here is for different model types, if you have a bunch of stars and a sky model, then how does the csv handle that? I suspect I would just group models of the same type together and then have to make a new header row when I get to a different model type.
I could make a tutorial which builds a csv using for loops etc. like the example you provided. Then users can grab that code and modify it how they like.

I'm sure there are other options too. Let me know what you think!

Autostronomy / AstroPhot

Best practice for book keeping? #196