Fix bug with with number of rows

athowes commented 2 years ago

I think this is what's causing the downstream issue in process_differentiate-high-risk, where there is a NULL entry for one of the list components:

Error in FUN(X[[i]], ...) : subscript out of bounds

athowes commented 2 years ago

Doesn't happen for just MWI, or just BWA, or MWI and ZMB together. Does happen for just MOZ, or MOZ and MWI together.

Conclusion: looks probably like it's a problem isolated to the MOZ data (this would make sense as the areas have changed so something could have happened.

athowes commented 2 years ago

It doesn't happen with 2/3 surveys in MOZ but does happen with 3/3 surveys in MOZ. Let's try all three survey types (AIS, DHS, and NA) but only 2/3 surveys: does happen.

Conclusion: something to do with having multiple survey types in MOZ?

athowes commented 2 years ago

Perhaps one obs_idx could correspond to multiple cat_idx?

> df %>%
+   group_by(obs_idx) %>%
+   summarise(test = length(unique(cat_idx))) %>%
+   filter(test != 3)
# A tibble: 2 × 2
  obs_idx  test
    <int> <int>
1   15494     1
2   25154     2

Gotcha!

> df %>% filter(obs_idx == 15494)
# A tibble: 1 × 32
  indicator      year age_group area_id  area_name area_idx area_sort_order center_x center_y survey_id n_clusters n_observations n_eff_kish
  <chr>         <int> <chr>     <chr>    <chr>        <int>           <int>    <dbl>    <dbl> <chr>          <dbl>          <dbl>      <dbl>
1 sexnonregplus  2011 Y020_024  MOZ_3_1… Nangade         13              38     39.8    -11.2 MOZ2011D…          1              1          1
# … with 19 more variables: x_eff <dbl>, estimate <dbl>, ci_lower <dbl>, ci_upper <dbl>, year_idx <int>, iso3_idx <int>, age_idx <int>,
#   cat_idx <int>, type_idx <int>, year_cat_idx <int>, iso3_cat_idx <int>, age_cat_idx <int>, area_cat_idx <int>, area_year_idx <int>,
#   age_iso3_idx <int>, obs_idx <int>, obs_test_idx <int>, area_idx_copy <int>, year_idx_copy <int>
> df %>% filter(obs_idx == 25154)
# A tibble: 2 × 32
  indicator  year age_group area_id    area_name area_idx area_sort_order center_x center_y survey_id n_clusters n_observations n_eff_kish
  <chr>     <int> <chr>     <chr>      <chr>        <int>           <int>    <dbl>    <dbl> <chr>          <dbl>          <dbl>      <dbl>
1 nosex12m   2011 Y020_024  MOZ_3_1013 Nangade         13              38     39.8    -11.2 NA                NA             NA         NA
2 sexcohab   2011 Y020_024  MOZ_3_1013 Nangade         13              38     39.8    -11.2 NA                NA             NA         NA
# … with 19 more variables: x_eff <dbl>, estimate <dbl>, ci_lower <dbl>, ci_upper <dbl>, year_idx <int>, iso3_idx <int>, age_idx <int>,
#   cat_idx <int>, type_idx <int>, year_cat_idx <int>, iso3_cat_idx <int>, age_cat_idx <int>, area_cat_idx <int>, area_year_idx <int>,
#   age_iso3_idx <int>, obs_idx <int>, obs_test_idx <int>, area_idx_copy <int>, year_idx_copy <int>

> #' It's 2011, MOZ_3_1013, 20-24
> ind %>%
+   filter(age_group == "Y020_024", area_id == "MOZ_3_1013", survey_id == "MOZ2011DHS")
# A tibble: 3 × 15
  indicator    survey_id survey_mid_cale… area_id area_name res_type sex   age_group n_clusters n_observations n_eff_kish estimate std_error
  <chr>        <chr>     <chr>            <chr>   <chr>     <chr>    <chr> <chr>          <dbl>          <dbl>      <dbl>    <dbl>     <dbl>
1 giftsvar     MOZ2011D… CY2011Q3         MOZ_3_… Nangade   all      fema… Y020_024           1              1          1        0         0
2 sexnonregpl… MOZ2011D… CY2011Q3         MOZ_3_… Nangade   all      fema… Y020_024           1              1          1        0         0
3 sexnonregsp… MOZ2011D… CY2011Q3         MOZ_3_… Nangade   all      fema… Y020_024           1              1          1        0         0
# … with 2 more variables: ci_lower <dbl>, ci_upper <dbl>

athowes commented 2 years ago

process_high-risk-differentiation ran successfully!

[ end        ]  2022-06-03 15:11:44
[ elapsed    ]  Ran report in 17.53372 mins
[ artefact   ]  best-3p1-multi-sexbehav-sae.csv: 547db1e067af8ce0d668771a91920fdb
[ ...        ]  3p1-boxplots.pdf: 36083f6b52aaa3d66c7e1a396f1b7014
[ commit     ]  process_differentiate-high-risk/20220603-145411-ee121503
[ copy       ]
[ import     ]  process_differentiate-high-risk:20220603-145411-ee121503
[ success    ]  :)
$`process_differentiate-high-risk`
NULL

athowes / multi-agyw

Fix bug with with number of rows #124