Check I have checked for ambiguous strands (A/T or C/G) in GWAS summary statistics, e.g. from Khera: "DNA polymorphisms with ambiguous strands (A/T or C/G) were removed from the score derivation." ?Talk to Anna/others about excluding SNPs where the effect allele does not match between GWAS SS and LD reference genotype file and validation genotype file
In step 1, when we tried to get the chromosome and position for the GWAS SS SNPs, we merged with a random 2000 sample of UKBB bim file. We lost a number of SNPs ~700,000. Try and merge these without losing SNPs from GWAS SS file. The raw plink files appear to be in here (but they are divided by chromosome): /oak/stanford/groups/euan/projects/ukbb/data/genetic_data/v2/plink
I have now done this: see R script ~/Documents/Stanford work/AF_G/2nd_run_through/GWAS_SS_CES.R
Exclude warfarin patients: do ICD codes for warfarin and self-reported wafarin taking: https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=4&nl=1 (CONsider making ICD and self-reported separate columns, for medication but also CHADVASc)
Consider adding in self-reported hypertension meds and also BP readings (with adjustment): See 'PRS_stroke.pdf': Also include patients that self-reported taking anti-HTN meds as per meta-stroke (PRS_stroke.pdf): For hypertension we used an expanded definition including self-reported high blood pressure (either on blood pressure medication, data fields #6177, #6153; or systolic blood pressure >140 mmHg, fields #4080, #93; or diastolic blood pressure >90 mmHg, data fields #4079, #94) as well as hospital records # I think an argument against including BP as a continuous variable is the fact that CHADSVASc considers BP as a binary variable