ScottishCovidResponse / SCRCIssueTracking

Central issue tracking repository for all repos in the consortium
6 stars 0 forks source link

Code Review: select_obs function #250

Closed peter-t-fox closed 4 years ago

peter-t-fox commented 4 years ago

Function: select_obs Location: Model.cpp Purpose: Computation of corrected incidences for cases and deaths time series

Signature:

void select_obs(int& Npop, int& t_index, int& duration, int& day_intro, int& day_shut, 
    std::vector<int>& obsHosp_tmp, std::vector<int>& obsDeaths_tmp, 
    std::vector<std::vector<int> > data_tmp, std::vector<std::vector<int> > death_tmp, int herd_id, 
    int time_back)
  1. The seqTime variable is not used and should be removed
  2. The function exclusively acts on the cases and deaths time series for a particular Health Board, so there is no reason to pass the entire cases/deaths data sets into the function.
  3. The code illustrates that it would be useful to define a data structure representing a time series of cases/deaths. This would simplify the code a bit.
  4. The first loop is not very readable:
    //create the vector of cases (cummulative) and define time of first detection (index)
    for (unsigned int iv = 1; iv < data_tmp[0].size(); ++iv) {
        if(data_tmp[herd_id][iv]>=0){
            seqTime.push_back(data_tmp[0][iv]);
            maxTime = std::max(maxTime,static_cast<int>(data_tmp[0][iv]));
            obsHosp_tmp.push_back(data_tmp[herd_id][iv]);
            obsDeaths_tmp.push_back(death_tmp[herd_id][iv]);
            if(data_tmp[herd_id][iv]>0 && t_index<0){
                t_index = (int)data_tmp[0][iv];//identify when the first case is detected/hospitalised 
            }       
        }
    }
    • It removes any points from the cases/deaths timeseries which are negative. Can these actually occur?? If so, it should be handled by input consistency checks in the IO code, not here.
    • static_cast is unnecessary: the parameter is already an integer.
    • The loop also computes the largest time stamp at which the number of cases was non-negative. Again, this would not be necessary if the number of cases were guaranteed to be non-zero.
    • It also computes the first time stamp at which a case occurred: this code is dangerous as it assumes a value for a parameter that was passed in (t_index)
    • t_index does not need to be passed in at all, as it is not used after the select_obs function returns.
  5. data_tmp and death_tmp are not altered by the function: they should be marked as const.
  6. Second loop seems to pad the observations time series with zeroes for days that were removed in the previous loop?
    //add the extra information on the observations
    if(static_cast<unsigned int>(duration) > obsHosp_tmp.size()){
        int extra_time = duration - obsHosp_tmp.size();
        std::vector<int> extra_cases, extra_deaths;
        for (int extra = 0; extra < extra_time; ++extra) {
            extra_cases.push_back(0);
            extra_deaths.push_back(0);
        }       
        for (unsigned int iv = 0; iv < obsHosp_tmp.size(); ++iv) {
            extra_cases.push_back(obsHosp_tmp[iv]);
            extra_deaths.push_back(obsDeaths_tmp[iv]);
        }       
        day_shut = day_shut+ extra_time;
        obsHosp_tmp = extra_cases;
        obsDeaths_tmp = extra_deaths;
        extra_cases.clear();    
        extra_deaths.clear();   
    }
    • extra_time would seem to correspond to number of days removed in previous loop.
    • End result is that the time series has been left justified (effectively) with zeroes for as many days as were removed in the previous loop?
    • Clearing of extra_cases and extra_deaths is unnecessary, as they fall out of scope at this point anyway.
  7. The final chunk of code applies the compute_incidence and correct_incidence functions to the new cases and deaths time series.

    std::cout << "Number of days of obs cases: " << obsHosp_tmp.size() << std::endl;
    std::cout << "Number of days of obs deaths: " << obsDeaths_tmp.size() << std::endl;
    //transform  cumulative numbers into incident cases
    std::vector<int> obsHosp_tmp2;
    compute_incidence(obsHosp_tmp,obsHosp_tmp2);
    correct_incidence(obsHosp_tmp2,obsHosp_tmp);    
    obsHosp_tmp = obsHosp_tmp2;
    obsHosp_tmp2.clear();   
    
    //transform  cumulative numbers into incident deaths
    std::vector<int> obsDeaths_tmp2;
    compute_incidence(obsDeaths_tmp,obsDeaths_tmp2);
    correct_incidence(obsDeaths_tmp2,obsDeaths_tmp);    
    obsDeaths_tmp = obsDeaths_tmp2;
    obsDeaths_tmp2.clear(); 
    • This could be massively simplified by changing the signatures of the compute_incidence and correct_incidence functions to use move semantics instead of out parameters e.g.
      obsHosp_tmp = compute_incidence(obsHosp_tmp);
      ...
github-actions[bot] commented 4 years ago

Heads up @thibaud-porphyre @peter-t-fox - the "Covid19_EERAModel" label was applied to this issue.

thibaud-porphyre commented 4 years ago
  1. the first loop is to create the vectors of observations based on the input files. basically, it defines what data will be used for the rest of the model. it also define t_index, which is the time of the first detected cases within the unit of interest.
    • re.negative incident events: it shouldn't occur (ever) but unfortunately it does because cases are re-allocated between health board but the data collated before the re-allocation is not modified to account for these changes. These changes create negative incident events. These only occur few times for few health board but it has massive impact on the future proofing of the inferences.
  2. extra_time is not time that has been removed from previous loop but the number of days prior the first detection (index cases) which are considered to show silent spread (spread of disease prior detection). the duration of this period is informed by the parameter hrp. Because we are modelling observed/detected cases and deaths (not true presence), we therefore padded the observation with zeros.
qingfengxia commented 4 years ago

Sure, I will look at this PR on Thursday, as I am quite new to the project, I may not be able to give good-enough in-depth feedback for the time being. I will try to finish it on that day.