Closed athowes closed 2 years ago
Doesn't happen for just MWI, or just BWA, or MWI and ZMB together. Does happen for just MOZ, or MOZ and MWI together.
Conclusion: looks probably like it's a problem isolated to the MOZ data (this would make sense as the areas have changed so something could have happened.
It doesn't happen with 2/3 surveys in MOZ but does happen with 3/3 surveys in MOZ. Let's try all three survey types (AIS, DHS, and NA) but only 2/3 surveys: does happen.
Conclusion: something to do with having multiple survey types in MOZ?
Perhaps one obs_idx
could correspond to multiple cat_idx
?
> df %>%
+ group_by(obs_idx) %>%
+ summarise(test = length(unique(cat_idx))) %>%
+ filter(test != 3)
# A tibble: 2 × 2
obs_idx test
<int> <int>
1 15494 1
2 25154 2
Gotcha!
> df %>% filter(obs_idx == 15494)
# A tibble: 1 × 32
indicator year age_group area_id area_name area_idx area_sort_order center_x center_y survey_id n_clusters n_observations n_eff_kish
<chr> <int> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 sexnonregplus 2011 Y020_024 MOZ_3_1… Nangade 13 38 39.8 -11.2 MOZ2011D… 1 1 1
# … with 19 more variables: x_eff <dbl>, estimate <dbl>, ci_lower <dbl>, ci_upper <dbl>, year_idx <int>, iso3_idx <int>, age_idx <int>,
# cat_idx <int>, type_idx <int>, year_cat_idx <int>, iso3_cat_idx <int>, age_cat_idx <int>, area_cat_idx <int>, area_year_idx <int>,
# age_iso3_idx <int>, obs_idx <int>, obs_test_idx <int>, area_idx_copy <int>, year_idx_copy <int>
> df %>% filter(obs_idx == 25154)
# A tibble: 2 × 32
indicator year age_group area_id area_name area_idx area_sort_order center_x center_y survey_id n_clusters n_observations n_eff_kish
<chr> <int> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 nosex12m 2011 Y020_024 MOZ_3_1013 Nangade 13 38 39.8 -11.2 NA NA NA NA
2 sexcohab 2011 Y020_024 MOZ_3_1013 Nangade 13 38 39.8 -11.2 NA NA NA NA
# … with 19 more variables: x_eff <dbl>, estimate <dbl>, ci_lower <dbl>, ci_upper <dbl>, year_idx <int>, iso3_idx <int>, age_idx <int>,
# cat_idx <int>, type_idx <int>, year_cat_idx <int>, iso3_cat_idx <int>, age_cat_idx <int>, area_cat_idx <int>, area_year_idx <int>,
# age_iso3_idx <int>, obs_idx <int>, obs_test_idx <int>, area_idx_copy <int>, year_idx_copy <int>
> #' It's 2011, MOZ_3_1013, 20-24
> ind %>%
+ filter(age_group == "Y020_024", area_id == "MOZ_3_1013", survey_id == "MOZ2011DHS")
# A tibble: 3 × 15
indicator survey_id survey_mid_cale… area_id area_name res_type sex age_group n_clusters n_observations n_eff_kish estimate std_error
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 giftsvar MOZ2011D… CY2011Q3 MOZ_3_… Nangade all fema… Y020_024 1 1 1 0 0
2 sexnonregpl… MOZ2011D… CY2011Q3 MOZ_3_… Nangade all fema… Y020_024 1 1 1 0 0
3 sexnonregsp… MOZ2011D… CY2011Q3 MOZ_3_… Nangade all fema… Y020_024 1 1 1 0 0
# … with 2 more variables: ci_lower <dbl>, ci_upper <dbl>
process_high-risk-differentiation
ran successfully!
[ end ] 2022-06-03 15:11:44
[ elapsed ] Ran report in 17.53372 mins
[ artefact ] best-3p1-multi-sexbehav-sae.csv: 547db1e067af8ce0d668771a91920fdb
[ ... ] 3p1-boxplots.pdf: 36083f6b52aaa3d66c7e1a396f1b7014
[ commit ] process_differentiate-high-risk/20220603-145411-ee121503
[ copy ]
[ import ] process_differentiate-high-risk:20220603-145411-ee121503
[ success ] :)
$`process_differentiate-high-risk`
NULL
I think this is what's causing the downstream issue in
process_differentiate-high-risk
, where there is aNULL
entry for one of the list components: