Open atrisovic opened 2 years ago
hi, these are really strong assumptions, for examples zip codes may change over years (as people move) and we would like to keep both, for each year we would like a zipcode. i also don't think we should disregard race completely if there are inconsistencies. also for hmo mo its only 0 for the second dataset on cardio outcomes, for the first we aren't making this restriction.
Hey,
Yes, she merged everything (👏) and this issue is a checklist for QC.
i dont think we can get away with one dataset since it will cut our mortality data in half if we restrict to hmo=0 and for mortality outcome there is no need for such restriction... since we are so tight on sample size i dont think we can afford this.
No no, we keep all hmos and have a single dataset with both cvds and dods (and the hmos).
(The two datasets are essentially one selection away (hmo==0), so most of the data would be duplicated in that case.)
yeah just save one big data, but then subset the data for him as requested, the mortality file doesnt need to restrict to hmo, the cvd file needs to restrict to 0 hmo but will need mortality info as well as mortality is a censoring event if it happens before cvd
I currently have the data set restricting to hmo_mo==0 but I can remove this filter so that it can be filtered later during the analysis. Then I will work on the QC checkpoints Ana has listed.
Up-to-date
Obsolete