Open knighttime opened 8 years ago
NOTE:
Amelia
- a package to visualize missing patterns http://www.inside-r.org/packages/cran/Amelia/docs/missmap
#check missing data
#this can be intensive, save everything in case computer crashes
Amelia::missmap(dsw, main = "Missing values vs observed")
tighten up the subsetting and creation of new variables:
# ---- tweak-data ---------------------------------------
d <- ds %>%
dplyr::mutate(
vital_status = ifelse(is.na(age_death) , 0 , 1), # jamie, this is it.
dementia_status = ifelse(is.na(age_death) , 0 , 1),
stroke_status = stroke_cum,
path_status = ad_reagan,
apoe_genotype = ifelse(apoe_genotype %in% c(44,34,24), 1, 0),
group_smell = ordered(cut(total_smell_test, c(0,5,10,12),
labels=c("anosmic", "hyposmic", "normosmic")))
)
x: age y: mmse facets: smell grouping and apoe status color : cumulative lifetime stroke status
x: age y: mmse facets: smell group and pathology status (1 = most pathologies, 3 = least ) color: apoe status
These graphs are super! is there any way to get rid of the NA column? - will pick this up next friday
Yes, just modify the dplyr::filter() call
re: filtering non-MAP studies - could this also be resolved by retaining all data from all studies (which someone might like to do)?
@andkov - got it!!! Thanks + I bought that graphing book 👍
@ampiccinin We discovered that the duplicates were mostly in MARS. So subsetting down to just MAP should fix the duplicates - though a bit more experimentation is needed to figure out why both Cassandra and Rebecca found duplicates in theirs - I thought that their data sets were already subsetted down to just MAP - so we need to check that.
@andkov
Friday July 8th, 9:30am
Meeting Agenda
I will update my MAP files before the meeting.