Add a table of aggregated outputs

chapuisk commented 3 years ago

A synthetic output overview of a simulation over replications should be a valuable addition. This report would give clear and straightforward numerical outputs, in addition to plot which purpose is to show the dynamic and magnitudes.

This output should be a table styled report (e.g. csv file) with usual data to assess covid19 epidemiological state, including death, cases, recovered and hospitalized (in/out ICU). Such numbers could come as contingencies, relative to 100k population (as it is the case in WHO situation report) and/or for age and gender groups.

ndgnuh commented 3 years ago

Hi Kevin, is this the kind of output you are looking for?

chapuisk commented 3 years ago

Hey Hung, Thx giving this issue a look ! For the general idea, yes this is what I had in mind. However, the content remains unclear, e.g. total is lower than susceptible? How about the meaning of a "total" of 2? While it is meaningful to have the total number of Hosp, ICU, Recovered and death, I am pretty sure the other agent states should be rework a little bit, showing the total number (with statistical moment of course) of infected and the proportion of asymptomatic (for exemple). An important aspect to consider is displaying those output relative to 100k population, that makes it possible to compare contrasting zone in terme of demography. Another aspect is about the statistical moment of choice: deviation from replications is an essential information, either using SD or Quartiles. Looking forward for discussions about this, Kevin

ndgnuh commented 3 years ago

Hi, I'm currently implementing the standard deviation part, does this look good?

I took the deviation of each number in each time step (or each table cell, if that's easier to understand), across replications. The implementation is as simple as follow:

def stdAcrossReplications(dfs, trim=False):
    n = len(dfs)
    def safeSum(x, y):
        return (x + y).replace(np.nan, 0)
    meanDfs = fp.reduce(safeSum, dfs) / n
    def safeSquareDist(df):
        return (df - meanDfs).replace(np.nan, 0).pow(2)
    return (sum(map(safeSquareDist, dfs)) / n).pow(1/2)

The end of the table is especially large because some replications is shorter than others, resulting NaN in calculation, which is replaced with 0, which is not very good, I'll have to think of something else.

chapuisk commented 3 years ago

Hi Hung, I think you r doing overcomplexify outputs... SD should not be at a time step level, but rather at the global simulation level. Let me rewind a little bit: for now, we have been doing step-by-step analysis, which is good but way too much detailed compare to what we know of our own model. The idea behind this request was to be able to assess quickly the state of a simulation run, without looking at a matrix of complex plots that depend over simulated times... just looking at summarizing stats about the state of the model for simulations: overall number of infected, the amount of people recovered, number of death, max bed occupation for a day, etc. those are unique numbers to describe the simulation over several replications, so SD should be computed AFTER the aggregation of simulation values, to assess the deviation from one simulation to another overall (not dependent over time) best, Kevin

ndgnuh commented 3 years ago

Hi, I spent some more time working on the numbers. The proportion doesn't look very good. I took them from the mean row, which doesn't look very good neither. Do you have any suggestions?

chapuisk commented 3 years ago

Seems like you r heading toward a strange direction... The idea is rather to have statistical moment over replications in line (i.e. mean, min, max, sd, with same value relatives to 100k agent - that requires to have the overall amount of agent, that should be first step value for each epidemiological state) and each column being one variable of interest [PS: not dependent over time, but summarizing one run]: overall nb of infected, overall nb of recovered, total nb of death, max nb of hospitalization for one day, max nb of ICU for one day, day of peak infection, day of first death, etc. Overall, we can say that there is 3 kind of variables ("total number of...", "maximum number of ... for one day" and "day of...") we are looking for and 2 types of statistical moment (raw moment - mean and the like - and relative to 100k moment) If u have further question, we may switch to PM. Best, Kevin

ndgnuh commented 3 years ago

Can we move the discussion to the Comokit's Slack?

RoiArthurB commented 3 years ago

Close with related PR merged

COMOKIT / comokit4py

Add a table of aggregated outputs #5