AshleyLab / risk_scores

LD Pred risk scores for afib
0 stars 0 forks source link

Run all the steps using Cardioembolic GWAS Summary statistics #13

Closed jackosullivanoxford closed 4 years ago

jackosullivanoxford commented 5 years ago

The analyses I did before May 2019 used the GWAS SS for All ischemic stroke. Re-do all LDpred steps for Cardioembolic stroke Summary statistics.

jackosullivanoxford commented 5 years ago

Also do a PRS with patients in warfarin excluded

jackosullivanoxford commented 5 years ago

When I re-run these analysis check the the AF patients we have in the plink files are European only. No need to exclude relatives as per LDpred github and the fact we control for PCAs in the logistic regression:

To do Step 3:

  1. Find the patient ids for all patients with AF:

    • This is here: /oak/stanford/groups/euan/projects/risk_scores/afib.cohort.txt # There are 17365 people with AFib in UKBB
  2. Exclude non-Europeans [DONE]

    • This is a list of Europeans (patids): /oak/stanford/groups/euan/projects/ukbb/gwas/pop_strat/v2/euro.txt # This leaves 16429 European with Afib in UKBB Code to check this: afib <- fread("/oak/stanford/groups/euan/projects/risk_scores/afib.cohort.txt") euro <- fread("/oak/stanford/groups/euan/projects/ukbb/gwas/pop_strat/v2/euro.txt") euro_afib <- merge(afib, euro, by = "V2")
  1. Make plink files for Europeans with AFib
    • I have started on this (July 9th 2019): I created the primary plink script (/oak/stanford/groups/euan/projects/risk_scores/ldpred_CES/step3/create_AF_plink_files) and the sbatch script (sbatch_create_AF_plink.sh). I hope it runs, if not - speak to Anna

For step 1:

  1. Check random 2000 (LD reference genotype file) have had first degree relatives excluded [DONE]. We can check this to see if any of the family IDs are the same in random 2000 (ukbb_sample.txt is list of patid for random 2000): I have done this and there are no 1st degree relatives: R code:

a <- fread("/oak/stanford/groups/euan/projects/risk_scores/ukbb_random_2k/ukbb_sample.txt") sum(duplicated(a$V1)) # equals 0 sum(duplicated(a$V2)) # equals 0

jackosullivanoxford commented 4 years ago

Still need to divide strokes into incident and prevalent cases.