epiverse-trace / serofoi

Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies
https://epiverse-trace.github.io/serofoi/
Other
17 stars 4 forks source link

`veev2012` contains duplicated data #161

Closed ntorresd closed 6 months ago

ntorresd commented 6 months ago

The dataset veev2012 contains duplicated data with different age group specification. Take a look at the first 12 rows of the serosurvey:

> veev2012[1:12, ]
          survey total counts age_min age_max tsur country  test      antibody
1  PAN-2012-VEEV     6      3       1       5 2012     PAN ELISA IgG anti-VEEV
2  PAN-2012-VEEV    14     11       5      10 2012     PAN ELISA IgG anti-VEEV
3  PAN-2012-VEEV    14     11      10      15 2012     PAN ELISA IgG anti-VEEV
4  PAN-2012-VEEV     8      5      15      20 2012     PAN ELISA IgG anti-VEEV
5  PAN-2012-VEEV     8      7      20      25 2012     PAN ELISA IgG anti-VEEV
6  PAN-2012-VEEV     5      5      25      30 2012     PAN ELISA IgG anti-VEEV
7  PAN-2012-VEEV     1      1      30      35 2012     PAN ELISA IgG anti-VEEV
8  PAN-2012-VEEV     2      2      35      40 2012     PAN ELISA IgG anti-VEEV
9  PAN-2012-VEEV     3      3      40      45 2012     PAN ELISA IgG anti-VEEV
10 PAN-2012-VEEV     2      2      45      50 2012     PAN ELISA IgG anti-VEEV
11 PAN-2012-VEEV    10      7      50      55 2012     PAN ELISA IgG anti-VEEV
12 PAN-2012-VEEV    20     14       1      10 2012     PAN ELISA IgG anti-VEEV

Note that the last 6 rows correspond to the sum of the corresponding age groups in the first 12 rows:

> veev2012[12:nrow(veev2012), ]
          survey total counts age_min age_max tsur country  test      antibody
12 PAN-2012-VEEV    20     14       1      10 2012     PAN ELISA IgG anti-VEEV
13 PAN-2012-VEEV    22     16      11      20 2012     PAN ELISA IgG anti-VEEV
14 PAN-2012-VEEV    13     12      21      30 2012     PAN ELISA IgG anti-VEEV
15 PAN-2012-VEEV     3      3      31      40 2012     PAN ELISA IgG anti-VEEV
16 PAN-2012-VEEV     5      5      41      50 2012     PAN ELISA IgG anti-VEEV
17 PAN-2012-VEEV    10      7      51      60 2012     PAN ELISA IgG anti-VEEV

I suggest we keep only the last six rows as the models still yield the expected result for the use case (less negative elpd for the tv_normal_log and large FoI):

image

I ran the models with iter=800, which is why the chains doesn't seem to have converged for the time-varying models.

@jpavlich you think it'd be convenient to add a check to prevent the user from using surveys with overlapping age groups? We've had this dataset in the package for a while and I just realized this. The models ran just fine even though the survey was not valid...