AshleyLab / risk_scores

LD Pred risk scores for afib
0 stars 0 forks source link

If the PRS deciles still put the highest prevalence of ischemic stroke in the lowest decile group - check the following #9

Open jackosullivanoxford opened 5 years ago

jackosullivanoxford commented 5 years ago
  1. How did we create the phenotype file? Did the outcome (ischemic stroke) get flipped to the opposite?
jackosullivanoxford commented 5 years ago
  1. Check the alleles in the GWAS summary statistics
  2. Check I have checked for ambiguous strands (A/T or C/G) in GWAS summary statistics, e.g. from Khera: "DNA polymorphisms with ambiguous strands (A/T or C/G) were removed from the score derivation." ?Talk to Anna/others about excluding SNPs where the effect allele does not match between GWAS SS and LD reference genotype file and validation genotype file
jackosullivanoxford commented 5 years ago
  1. Consider using PLINK2 to sum the betas across all the variants (instead of step3 in ldpred)
jackosullivanoxford commented 5 years ago
  1. In step 1, when we tried to get the chromosome and position for the GWAS SS SNPs, we merged with a random 2000 sample of UKBB bim file. We lost a number of SNPs ~700,000. Try and merge these without losing SNPs from GWAS SS file. The raw plink files appear to be in here (but they are divided by chromosome): /oak/stanford/groups/euan/projects/ukbb/data/genetic_data/v2/plink
jackosullivanoxford commented 5 years ago
  1. In step 1, instead of using 1-Freq1 for reffrq, use Freq1. Try this on the AIS run through and see if it changes things!
jackosullivanoxford commented 4 years ago
  1. Check how I have created the deciles
jackosullivanoxford commented 4 years ago
  1. Exclude warfarin patients: do ICD codes for warfarin and self-reported wafarin taking: https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=4&nl=1 (CONsider making ICD and self-reported separate columns, for medication but also CHADVASc)
jackosullivanoxford commented 4 years ago
  1. Add in self-reported diagnoses of AF and ischemic stroke. This is the UKBB data showcase for self-reported codes: http://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=6

October 2019: THIS IS NOW DONE

jackosullivanoxford commented 4 years ago
  1. Still need to divide strokes into incident and prevalent cases.
jackosullivanoxford commented 4 years ago
  1. Check HLA regions are excluded from AF genotypes (step 3 of ldpred)
jackosullivanoxford commented 4 years ago
  1. Consdier adding other event outcomes (e.g. PE, claudication etc)
jackosullivanoxford commented 4 years ago
  1. Consider removing self-reported and ICD TIA (and possibly generic self-reported 'stroke') from phenotype file.
jackosullivanoxford commented 4 years ago
  1. Add in ICD9 codes for a) AF, b) stroke (outcome) and c) CHADSVASc codes
jackosullivanoxford commented 4 years ago
  1. Consider adding in self-reported hypertension meds and also BP readings (with adjustment): See 'PRS_stroke.pdf': Also include patients that self-reported taking anti-HTN meds as per meta-stroke (PRS_stroke.pdf): For hypertension we used an expanded definition including self-reported high blood pressure (either on blood pressure medication, data fields #6177, #6153; or systolic blood pressure >140 mmHg, fields #4080, #93; or diastolic blood pressure >90 mmHg, data fields #4079, #94) as well as hospital records # I think an argument against including BP as a continuous variable is the fact that CHADSVASc considers BP as a binary variable
jackosullivanoxford commented 4 years ago
  1. ADD algorithmically defined stroke outcomes to phenotype file and then re-run step 3