bvilhjal / ldpred

MIT License
95 stars 58 forks source link

Question about Output files #130

Open hrafnfaedhir opened 4 years ago

hrafnfaedhir commented 4 years ago

Hello, I finished running through the three LDpred steps and I had a question about the output file format. I've attached a single ".adj" file as a point of comparison. Here is the STDOUT from ldpred score step:

(base) Chriss-iMac:LDpred cpatterson$ ldpred score --gf Validation/$type"_"$R/Validation.model_$mod.chrMT --rf Working/LDpred.$type"_"$R.Model_$mod.chrMT --out Results/$type"_"$R/$model/Validation --pf Results/$type"_"$R/$model/Validation.pheno.txt --pf-format STANDARD --rf-format LDPRED --summary-file Results/$type"_"$R/Model_$model.Summary.txt --pcs-file Validation/QCReports/Validation.chrMT.PCA.eigenvec

=============================== LDpred v. 1.0.10 ===============================

Results/MCC_0.3/TEST/Validation.pheno.txt Parsed 1690 phenotypes successfully

Calculating LDpred-inf risk scores 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0002 (0.024321) Variance explained (Pearson R2) by PRS + PCs: 0.0133 (0.024003)

Calculating LDpred risk scores using f=1.000e+00 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0001 (0.024323) Variance explained (Pearson R2) by PRS + PCs: 0.0132 (0.024005)

Calculating LDpred risk scores using f=3.000e-01 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0001 (0.024323) Variance explained (Pearson R2) by PRS + PCs: 0.0132 (0.024005)

Calculating LDpred risk scores using f=1.000e-01 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0001 (0.024324) Variance explained (Pearson R2) by PRS + PCs: 0.0131 (0.024005)

Calculating LDpred risk scores using f=3.000e-02 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0000 (0.024325) Variance explained (Pearson R2) by PRS + PCs: 0.0131 (0.024006)

Calculating LDpred risk scores using f=1.000e-02 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0000 (0.024325) Variance explained (Pearson R2) by PRS + PCs: 0.0131 (0.024007)

Calculating LDpred risk scores using f=3.000e-03 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0001 (0.024324) Variance explained (Pearson R2) by PRS + PCs: 0.0131 (0.024005)

Calculating LDpred risk scores using f=1.000e-03 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0001 (0.024323) Variance explained (Pearson R2) by PRS + PCs: 0.0132 (0.024004) The highest (unadjusted) Pearson R2 was 0.0007, and provided by LDpred_inf

=============================== Scoring Summary ================================ Validation genotype file (prefix):
Validation/MCC_0.3/Validation.model_1.chrMT Input weight file(s) (prefix):
Working/LDpred.MCC_0.3.Model_1.chrMT Output scores file(s) (prefix):
Results/MCC_0.3/TEST/Validation ---------------------------------- Phenotypes ---------------------------------- Phenotype file (STANDARD format):
Results/MCC_0.3/TEST/Validation.pheno.txt Individuals with phenotype information: 1690 Running time for parsing phenotypes: 0 min and 0.01 secs Parsed PCs file:
Validation/QCReports/Validation.chrMT.PCA.eigenvec Individuals w missing PCs: 1 ----------------------------------- Scoring ------------------------------------ LDpred_inf (unadjusted) Pearson R2: 0.0007 Best LDpred (f=1.00e+00) (unadjusted) R2: 0.0006 Running time for calculating scores: 0 min and 0.92 secs --------------------------- Optimal polygenic score ---------------------------- Method with highest (unadjusted) Pearson R2: LDpred_inf Best (unadjusted) Pearson R2: 0.0007 ================================================================================`

So my first question is why is there a discrepancy between the LDpred_inf (unadjusted) Pearson R2 in the initial calculation STDOUT and the value in the Scoring section?

Calculating LDpred-inf risk scores 100.00% Variance explained (Pearson R2) by PRS adjusted for PCs: 0.0002 (0.024321) Variance explained (Pearson R2) by PRS + PCs: 0.0133 (0.024003) ----------------------------------- Scoring ------------------------------------ LDpred_inf (unadjusted) Pearson R2: 0.0007 Best LDpred (f=1.00e+00) (unadjusted) R2: 0.0006 Running time for calculating scores: 0 min and 0.92 secs

Second, is the PRS adjusted for PCs appears to be the Variance Explained for just the genetic data while using the PCs as a covariate (I assume the PCs are included in the model while constructing the PRS scores) but not including them during the calculation of Pearson R2, and the PRS+PC is the Variance Explained when the PCs are included in the final Pearson R2?

Third, when I import the scores into R to calculate the R^2, I get the same corresponding R2 for the ldpred-inf model in the Scoring section, and the R2 for the pc_prs correspond well to the "Variance explained (Pearson R2) by PRS + PCs" field. true_phens ~ PRS R2=0.000660 true_phens ~ pc_prs R2= 0.013257 However these values were calculated using a gaussian model, rather than a Logit model since we have case-control phenotypes. I did notice that the phenotypes in the output files have been converted to double format. When I run them with the logit function, I get: true_phens ~ PRS R2=0.000524 true_phens ~ pcprs R2= 0.010315 During the ldpred coord step I used the following code: `ldpred --debug coord --gf $ldpredDIR/LDReference/LDReference.EUR.chrMT --ssf Statistics/$type""$R/NIAGADS.SummaryStats.$type"_"$R.Model_1a.chrMT.txt --ssf-format CUSTOM --eff LOGOR --rs SNP_ID --A1 ALT --A2 REF --pos POS --chr CHR --reffreq REF_FRQ --pval PVAL --se SE --ncol N --efftype LOGOR --out Coordinate/Coordination.$type""$R.Model$mod.chrMT.hdf5 --maf 0 --vbim Validation/$type""$R/Validation.model_$mod.chrMT.bim` How do I inform the program that I am dealing with Affection Status and not a quantitative phenotype?

Thank you very much for your time.

Validation_LDpred-inf.txt.adj.txt