epiforecasts / socialmixr

R package for deriving social mixing matrices from survey data.
http://epiforecasts.io/socialmixr/
Other
38 stars 11 forks source link

population data not found but the country is in the list #39

Closed linyang17 closed 2 years ago

linyang17 commented 2 years ago

Hi, I tried to run the contact_matrix() function with cleaned BICS data, but it says no population data for United States, which I have tested to be in the wpp_countries() list. Could you please advice how I could fix this?

> BICS_survey <- clean(BICS_survey, country.column = "country")
> res_sym <- contact_matrix(BICS_survey, age.limits = seq(0,82,2), n=10, symmetric = TRUE)
Error in contact_matrix(BICS_survey, age.limits = seq(0, 82, 2), n = 10,  : 
  Could not find population data for United States.  Use wpp_countries() to get a list of country names.
> res_sym <- contact_matrix(BICS_survey, countries = "United States", age.limits = seq(0,82,2), n=10, symmetric = TRUE)
Error in contact_matrix(BICS_survey, countries = "United States", age.limits = seq(0,  : 
  Could not find population data for United States.  Use wpp_countries() to get a list of country names.
> identical(wpp_countries()[193], survey_countries(BICS_survey))
[1] TRUE

The BICS data has been transform as follows if that might be of help.

participants <- df_allpc %>%
  select(rid, gender, ego_age) %>%
  filter(ego_age < 82) %>%
  group_by(rid) %>%
  mutate(part_id = cur_group_id()) %>%
  as.data.table()

participants <- unique(participants)
colnames(participants)[3] <- "part_age"
participants$part_age <- as.integer(participants$part_age)
colnames(participants)[2] <- "part_gender"
participants$country <- "United States"
#participants$country <- as.factor(participants$country)
participants$year <- 2020

contacts <- df_allpc %>% 
  inner_join(participants, by="rid") %>%
  select(part_id, alter_age, alter_sex, part_gender, country, year) %>%
  filter(alter_age < 82) %>%
  arrange(part_id) %>%
  as.data.table()

contacts$alter_age <- as.integer(contacts$alter_age)
contacts$cont_id <- 1:nrow(contacts)
colnames(contacts)[2] <- "cnt_age_exact"
colnames(contacts)[3] <- "cnt_gender"

BICS_survey <- survey(participants, contacts, reference = NULL)
BICS_survey <- clean(BICS_survey, country.column = "country")
sbfnk commented 2 years ago

Hi linyang17, thank you for raising this issue! It seems like you might have discovered a bug. Do you think you might be able to share some form of the BICS_survey data set that would allow me to reproduce the problem?

linyang17 commented 2 years ago

Hi @sbfnk, these are the first few rows of participants and contacts data, which has been cleaned to match the form. The code for cleaning is in here and the data can be found in df1 and df_alter1 Thanks a lot for your help.

> head(BICS_survey[["participants"]])
  part_id part_gender part_age       country year
1       1           F        6 United States 2020
2       2           F        6 United States 2020
3       3           F        6 United States 2020
4       4           F        6 United States 2020
5       5           F        6 United States 2020
6       6           F        6 United States 2020 
> head(BICS_survey[["contacts"]])
  part_id cnt_age_exact cnt_gender part_gender       country year cont_id
1       1             6          F           F United States 2020       1
2       1             6          F           F United States 2020       2
3       1             6          F           F United States 2020       3
4       1             6          F           F United States 2020       4
5       1             6          F           F United States 2020       5
6       1             6          F           F United States 2020       6

and the original BICS data conducted in the US in 2020 is in the form of

> head(df1)
                                    rid gender   agecat agecat_w0 num_cc num_cc_topcode_val
1: 5f5d11d5-d056-c623-53fa-d6d9d520faef Female  [35,45)   [35,45)      0                Inf
2: 5f5d120e-c1be-7245-1501-1d799556b3ac Female  [45,55)   [45,65)      3                Inf
3: 5f5d1200-d96b-7600-4e09-9367e5ae0f99 Female  [55,65)   [45,65)      0                Inf
4: 5f5d1215-dcc6-0e2d-f407-089bbdab1fc8 Female  [45,55)   [45,65)      0                Inf
5: 5f5d11e0-2a1d-0135-8e5a-8815e481c21b Female [65,100]  [65,100]      1                Inf
6: 5f5d11e1-6e07-4ec9-1760-54f3d9960ea3   Male  [25,35)   [25,35)     11                Inf
   num_cc_nonhh num_cc_nonhh_topcode_val wave     city ethnicity hispanic urbanrural
1:            0                      Inf    3 National     White        1      Urban
2:            0                      Inf    3 National     White        0   Suburban
3:            0                      Inf    3 National     White        0   Suburban
4:            0                      Inf    3 National     White        1      Rural
5:            0                      Inf    3 National     White        0      Urban
6:           10                      Inf    3 National     White        0   Suburban
                   educ reference_weekday w_hhsize weight_pooled
1:     College graduate              TRUE        1     0.6836066
2:     College graduate              TRUE        4     0.4913999
3: High school graduate              TRUE        1     1.9512931
4:     College graduate              TRUE        1     1.8137739
5:         Some college              TRUE        2     1.4693752
6:     College graduate              TRUE        2     0.8989975
> head(df_alter1)
                                    rid alter_num wave alter_sex alter_agecat_w0 alter_agecat
1: 5f5d120e-c1be-7245-1501-1d799556b3ac         1    3      Male         [45,65)      [45,55)
2: 5f5d120e-c1be-7245-1501-1d799556b3ac         2    3    Female          [0,18)       [0,18)
3: 5f5d120e-c1be-7245-1501-1d799556b3ac         3    3      Male          [0,18)       [0,18)
4: 5f5d11e0-2a1d-0135-8e5a-8815e481c21b         1    3      Male        [65,100]     [65,100]
5: 5f5d11e1-6e07-4ec9-1760-54f3d9960ea3         1    3    Female         [25,35)      [25,35)
6: 5f5d11e9-ffd1-73d7-4ba8-f97126d3964a         1    3    Female         [25,35)      [25,35)
   alter_age hh_alter rel_spouse rel_family rel_friend rel_work rel_neighbor rel_other
1:        51     TRUE         NA         NA         NA       NA           NA        NA
2:        17     TRUE         NA         NA         NA       NA           NA        NA
3:        13     TRUE         NA         NA         NA       NA           NA        NA
4:        71     TRUE         NA         NA         NA       NA           NA        NA
5:        29     TRUE         NA         NA         NA       NA           NA        NA
6:        29     TRUE         NA         NA         NA       NA           NA        NA
   rel_egoclient rel_altercustomer loc_home loc_store loc_restbar loc_work loc_street
1:            NA                NA       NA        NA          NA       NA         NA
2:            NA                NA       NA        NA          NA       NA         NA
3:            NA                NA       NA        NA          NA       NA         NA
4:            NA                NA       NA        NA          NA       NA         NA
5:            NA                NA       NA        NA          NA       NA         NA
6:            NA                NA       NA        NA          NA       NA         NA
   loc_church loc_transit loc_other loc_school loc_egohome loc_otherhome protect_mask
1:         NA          NA        NA         NA          NA            NA           NA
2:         NA          NA        NA         NA          NA            NA           NA
3:         NA          NA        NA         NA          NA            NA           NA
4:         NA          NA        NA         NA          NA            NA           NA
5:         NA          NA        NA         NA          NA            NA           NA
6:         NA          NA        NA         NA          NA            NA           NA
   protect_gloves protect_other protect_none is_physical is_cc ego_age ego_agecat ego_agecat_w0
1:             NA            NA           NA          NA    NA      51    [45,55)       [45,65)
2:             NA            NA           NA          NA    NA      51    [45,55)       [45,65)
3:             NA            NA           NA          NA    NA      51    [45,55)       [45,65)
4:             NA            NA           NA          NA    NA      71   [65,100]      [65,100]
5:             NA            NA           NA          NA    NA      29    [25,35)       [25,35)
6:             NA            NA           NA          NA    NA      31    [25,35)       [25,35)
   ego_weight_pooled ego_weight_city num_alters_reported num_hh_alters_reported
1:         0.4913999       0.6516816                   3                      3
2:         0.4913999       0.6516816                   3                      3
3:         0.4913999       0.6516816                   3                      3
4:         1.4693752       1.5096908                   1                      1
5:         0.8989975       0.8381350                   4                      1
6:         1.2031006       1.5013512                   4                      2
   num_nonhh_alters_reported alter_weight alter_weight_onlycc     city
1:                         0            1                   1 National
2:                         0            1                   1 National
3:                         0            1                   1 National
4:                         0            1                   1 National
5:                         3            1                   1 National
6:                         2            1                   1 National
sbfnk commented 2 years ago

Hi @linyang17, thanks for your patience with this. Do you still have this problem?

sbfnk commented 2 years ago

Closing this for now. Please reopen if this reappears.