LearningToTalk / L2TDatabase

Helper functions for working with our lab's MySQL database
GNU General Public License v2.0
0 stars 0 forks source link

capture children's nesting in families #44

Closed tjmahr closed 7 years ago

tjmahr commented 8 years ago

Record the fact that some of the children are siblings. I skimmed and counted at least 27 children who have a sibling in a longitudinal study (12 twins, 1 triple).

This change can be an optimization I make later on. It would make sure household demographics are the same for siblings, prevent data duplication, and allow more sophisticated queries about e.g. LENA audio environments and literacy survey information.

Basic schema

Create a table, one row per child, mapping children to families.

ChildID FamilyID
1 1
2 2
3 3
4 3
... ...

3 and 4 would be siblings.

Then for caregiver demographics, map caregivers to families.

CaregiverID FamilyID Relation ...
1 1 Mother ...
2 1 Mother ...
3 2 Mother ...
4 2 Father ...
... ... ... ...
tjmahr commented 7 years ago

Cross-study sibling-ships.

longitudinal twins (already noted above)

longitudinal twins (not covered above)

Not relevant but known:

Found this from out tracking database but 452D never actually participated in any studies.

tjmahr commented 7 years ago

Created Household table to create unique IDs for households and HouseholdChild table to document nesting of children in households.

Right now, we have 301 children nested in 272 households.

"302E / 452D" will not be documented siblings because we never got any data from 452D, so I never added them to database.

Only the following sibling sets still have to be added. The main delay being that I haven't created database IDs for the Maternal Ed. (M) study participants.

So:

Next steps:

Maybe even... Because we have nesting of children in households, we can map the survey responses to households so we can get all the surveys that pertain to a particular child's household. I think the survey data is too messy and too sensitive to add to the database as is, so this might be an offline tool.

tjmahr commented 7 years ago

Hmm I was talking through this design with my wife, and she said that these kinds of mapping tables--e.g., HouseholdChild to capture combinations of HouseholdIDs and ChildIDs--are best for many-to-many relationships.

Basically, we don't expect any children to be nested in more than one household and we can simplify the design by making a HouseholdID column in the Child table. No need for the mapping table. Thus, the HouseholdID is just another piece of information about the child like their birth-date or dialect.

janroslynedwards commented 7 years ago

Great idea -- thanks, Amanda!

Yours,

Jan

tjmahr commented 7 years ago

Added a HouseholdID field to the Child to table to document which Households contain which children. All the children in the database are now associated with a household in a way that let's identify us siblings.

Demo to identify siblings

library(dplyr)
library(L2TDatabase)

l2t <- l2t_connect("./inst/l2t_db.cnf")

# Get all the ResearchIDs
everyone <- tbl(l2t, "Child") %>%
  left_join(tbl(l2t, "ChildStudy")) %>%
  collect()

# Reconstruct original sibling lists documented in GitHub issue
# https://github.com/LearningToTalk/L2TDatabase/issues/44
everyone %>%
  select(ShortResearchID, ChildID, HouseholdID) %>%
  # Flatten different ResearchIDs within children into a single string
  group_by(ChildID) %>%
  mutate(Alias = ShortResearchID %>% unique %>% 
           sort %>% paste0(collapse = " = ")) %>%
  # Headcount and flatten list of IDs within households into a single string
  group_by(HouseholdID) %>%
  summarise(nKids = n_distinct(ChildID),
            Kids = Alias %>% unique %>% sort %>% paste0(collapse = " / ")) %>%
  filter(1 < nKids) %>%
  as.data.frame
#>    HouseholdID nKids               Kids
#> 1           11     2        011L / 126L
#> 2           17     2        017L / 438D
#> 3           19     2        019L / 020L
#> 4           23     2        024L / 428D
#> 5           40     2        041L / 042L
#> 6           44     2        046L / 401D
#> 7           57     2        059L / 060L
#> 8           64     2        067L / 408D
#> 9           73     2        076L / 411D
#> 10          80     2        083L / 446D
#> 11          88     2        091L / 420D
#> 12          90     2        093L / 094L
#> 13          91     3 095L / 096L / 441D
#> 14          92     3 097L / 098L / 099L
#> 15          93     2        100L / 101L
#> 16         102     3 110L / 111L / 422D
#> 17         103     2        112L / 418D
#> 18         105     2        114L / 442D
#> 19         109     2        118L / 119L
#> 20         110     2 120L = 458D / 407D
#> 21         111     2        121L / 122L
#> 22         112     2        123L / 434D
#> 23         117     3 129L / 130L / 300E
#> 24         119     2        132L / 424D
#> 25         123     2        202T / 203T
#> 26         127     2        207T / 208T
#> 27         133     2        305E / 456D
#> 28         148     2        413D / 414D
#> 29         150     2        416D / 417D
#> 30         167     2        443D / 500M
#> 31         188     2        509M / 510M
#> 32         197     2        553M / 554M
#> 33         223     2        625L / 683L
#> 34         269     2        671L / 684L
#> 35         278     2        680L / 681L
#> 36         280     2        685L / 686L

Children per household

tbl(l2t, "Child") %>% collect %>% nrow
#> 333
tbl(l2t, "Household") %>% collect %>% nrow
#> 293

333/293
#> [1] 1.136519

Demo to count twins, etc

# Coding scheme for birthdates-in-households
birth_labels <- data_frame(
  n = 1:3, 
  label = c("Singleton", "Twin", "Triplet"))

# Count frequency of birthdates within households then count frequencies to
# determine number of twins
tbl(l2t, "Child") %>% 
  count(HouseholdID, Birthdate) %>% 
  ungroup %>% 
  select(-Birthdate) %>% 
  # Count the frequencies
  count(n) %>% 
  collect %>% 
  left_join(birth_labels) %>% 
  select(Type = label, N = nn)
#> # A tibble: 3 × 2
#>        Type     N
#>       <chr> <dbl>
#> 1 Singleton   298
#> 2      Twin    16
#> 3   Triplet     1