cta-observatory / protopipe

Prototype data analysis pipeline for the Cherenkov Telescope Array Observatory
https://protopipe.readthedocs.io/en/latest/
Other
5 stars 13 forks source link

Prototype of a performance tool with statistical errors via bootstrapping #82

Open HealthyPear opened 3 years ago

HealthyPear commented 3 years ago

Context

protopipe.perf is expected to contain at least 1 script to produce DL3 data (see issue #73).

The possibility to add statistical uncertainties to protopipe DL3 output has been already tested using the bootstrap method.

The single script(s) should perform the basic operation (produce DL3 from DL2). In this issue, a second script is proposed to encapsulate the first one and perform the bootstrapping.

Requirements

Proposal

name: make_DL3_bootstrap.py

expected input: configuration file and/or CLI options

structure proposal:

# IMPORTS

# ...
from protopipe.perf import make_DL3_EventDisplay
from protopipe.perf import make_DL3_CTAMARS
# + any other pyirf-based approach for DL2->DL3 production
# ...

from protopipe.perf.utils import getDL2

def DL2_resampling(DL2): # where DL2 is a table

  # ....

  return DL2_table_from_resampled_data

def main():

    # READ CONFIG

    conf = load_config(yaml)        # updated version of the current YAML configuration file
    DL2_protopipe = conf.indir           # path to DL2 data as produced by protopipe.scripts.write_dl2
    N = conf.N                              # boostrap iterations
    approach = conf.approach    # DL3-generating script to use

    quantities =[...] # list of DL3 output information of which we want to save expectation values + uncertainties 
    DL2_pyirf = getDL2(DL2_protopipe)     # translation of protopipe/ctapipe DL2 data into pyirf internal format

    # BOOTSTRAP CYCLE

    expectation_values = {IRF_1 : [], ..., ecc }
    uncertainty_values = {IRF_1 : [], ..., ecc }

    DL3_iteration_output = {}

    for iteration in range(N):

        # resample DL2 data in pyirf-ready format
        DL2_pyirf_resampled = DL2_resampling(DL2_pyirf):

        # output could be the FITS file or a list of HDUs
        DL3_iteration_output[iteration] = globals()["approach"](DL2_pyirf_resampled, iteration) 

        if conf.save_all: # save the DL3 data from the single iteration in a separate folder

            # save DL3_interation to f`protopipe_{approach}_bootstrap#{iteration}.fits.gz`

for quantity in quantities: # from DL3_iteration_output[iteration]
     # Get expectation values (e.g. medians) in expectation_values
     # Get uncertainties (e.g. stds) in uncertainty_values

# write DL3 fits file by using expectation_values in the standard format place and uncertainty_values as an additional colum

if __name__ == "__main__":
    main()