Develop routines to compare results from two similar recipes

awhoward commented 3 years ago

During development of most modules, we will assess whether a particular code change improves some metric of data quality (e.g., the RMS of the RVs). Over and over again we will want to compare results from RecipeA and RecipeB (and maybe RecipeC, etc.) that differ in a small way and are run on the same data. This seems like something worth generalizing.

In addition, the automated testing will also compare results of the same recipe run on the same data with different versions of the code.

To facilitate this, it would be helpful to have some some code that runs CodeA/CodeB on RecipeA/RecipeB with Data and returns ResultsA/ResultsB.

petigura commented 3 years ago

This is a good idea, Andrew. Here's a sketch for a possible implementation, which follows a pandas pattern.

Each pandas series has a describe method, which computes useful summary statistics for that data container

s1 = pd.Series(np.random.randn(100)) + 10 
s2 = pd.Series(np.random.randn(100)) + 10

print(s1.describe())
print(s2.describe())

which returns

count    100.000000
mean       9.900462
std        1.016894
min        6.007688
25%        9.313314
50%        9.922723
75%       10.481145
max       12.212839
dtype: float64
count    100.000000
mean       9.874505
std        1.107978
min        6.968830
25%        9.113510
50%        9.953661
75%       10.684066
max       12.193387
dtype: float64

By analogy, each KPF data container could have the same required method:

kpftwo = KPF2(*args)
kpftwo.describe()

which would return useful info

rv = 16.02134124
rv_err = 0.124145

One could then write a simple function that compares these summary statistics. For pandas, this would be:

def compare(s1,s2):
    d1 = s1.describe()
    d2 = s2.describe()
    comp = pd.DataFrame(
        {'A':d1,
         'B':d2,
         'A - B':d1-d2,
         '(A - B)/A':(d1-d2)/d1}
    )
    print(comp)

compare(s1,s2)

which returns

                A           B     A - B  (A - B)/A
count  100.000000  100.000000  0.000000   0.000000
mean     9.900462    9.874505  0.025957   0.002622
std      1.016894    1.107978 -0.091084  -0.089571
min      6.007688    6.968830 -0.961143  -0.159985
25%      9.313314    9.113510  0.199804   0.021454
50%      9.922723    9.953661 -0.030938  -0.003118
75%     10.481145   10.684066 -0.202921  -0.019361
max     12.212839   12.193387  0.019452   0.001593

or by analogy

compare(kpftwo1,kpftwo2)

       A          B         B - A 
rv     16.02134   16.02131  -0.00003
rv_err 0.124145   0.104145  -0.02000

bjfultn commented 1 year ago

will implement as an automated performance test

Keck-DataReductionPipelines / KPF-Pipeline

Develop routines to compare results from two similar recipes #192