biostat0903 / DBSLMM

Deterministic Bayesian Sparse Linear Mixed Model
https://biostat0903.github.io/DBSLMM/
5 stars 3 forks source link

NaN in a result file (*.dbslmm.txt) #20

Closed kkonoo closed 3 years ago

kkonoo commented 3 years ago

Hi, there I used this program to get the re-scaled genetic effect of variants considering LD structure a few weeks ago. In the last time, I used public summary statistics and LD references (from 1KGP) to run this program. In this time, I used private summary results and my own LD reference file but I have got NaN in a result file. What does it mean?

This is my argument and it seemed to run successfully. image However, there are NaN in all variants. image This is a part of my summary file (containing ~5k variants within one LD block). I think there is no problem with this file. image

+) I tried to run with a subset of my summary file containing about 500 variants (a part of variants within one LD block) and this time, there is no NaN!! What was wrong with this? Do I need to update the name of variants with rsID (both summary files & LD reference files) because of a memory issue or something? image

biostat0903 commented 3 years ago

Hi, Thanks for your attention to DBSLMM. This output is caused by that this block includes some SNPs with high correlation. I think if you set the P value to 1e-6 or the r2 to 0.01, the problem will be solved. If you have any problem for DBSLMM, please feel free to ask me. Best, Sheng

kkonoo commented 3 years ago

Hi, Thank you for your reply. Unfortunately, setting the P-value to 1e-06 or the r2 to 0.01 didn't solve the problem. And I'm somewhat confused that including high correlated SNPs can cause problems because DBSLMM basically handles with whole genome-wide SNPs. Do I misunderstand this program?

biostat0903 commented 3 years ago

Hi, Please check MAF of all SNPs of reference panel. I think there are some SNPs with MAF=0. By the way, please check the SNP number. In DBSLMM, --nsnp is the SNP number of all chromosome, rather than each chromosome. Best, Sheng

kkonoo commented 3 years ago

Hi, However, I already made my reference panel with MAF filtering (>0.01). Oh, I should fix the nsnp value..! Plus, the third column of the output format is scaled effect sizes as written in your manual.rmd? According to your script, you calculate PRS with the fourth column of DBSLMM output as follows. ${plink} --bfile ${bfilete}${chr} --score ${est}${chr}.assoc.dbslmm.txt 1 2 4 sum --out ${InterPred}${chr}

biostat0903 commented 3 years ago

Hi, Thanks for your consideration again. DBSLMM needs the MAF>0. You can use the fourth column to calculate the traits. If there are any questions, please feel free to ask me.

Best, Sheng

kkonoo commented 3 years ago

Hi, Thank you for your reply. Yes, I'm sure to have no MAF=0 snps in my LD reference. Then, what's the reason for nan in my result?

biostat0903 commented 3 years ago

Hi, When I check your code, I find one problem that in my setting, you can run the whole chromosome, rather than clumping them into different block. DBSLMM can automatically split a whole chromosome to different blocks. I am not sure whether it cause the problem. If you use the public data, could you please send me your reference panel and summary statistics for chromosome 16, I will check it for you. Thanks for your kindly help! Best, Sheng

kkonoo commented 3 years ago

Hi, Thank you for your suggestion. Because of my computer resources, using whole chromosome data is not available (it gives me an error 'out of memory'). I don't know what the problem is in my LD reference;( My reference panel is private data, so I am sorry for not sharing the data.

However, I ran this program successfully by converting rsID and using public LD reference (1KGP). Thank you for all your help!

Best, E.H.

biostat0903 commented 3 years ago

Hi E.H. Thanks for your consideration to DBSLMM.

Best, Sheng

kkonoo commented 3 years ago

Oh, I misunderstood my reference panel. Yes, you're right, My reference panel contains MAF=0 snps. I thought I had filtered MAF<0.01 snps based on MAF in INFO of my vcf file (~70K individuals). However, I first extracted a subset of the vcf file (~200 individuals) for LD reference.. In that case, I should have filtered --maf option again..! Thank you so much.

Best, E.H.

biostat0903 commented 3 years ago

Hi E.H., Great! Any problem you can ask me! Here, I recommend you to use the PGS-Server (www.pgs-server.com). PGS-Server can fit 12 PGS construction model based on summary statistics. If there are any problem, you can leave massage at https://github.com/biostat0903/PGS-Server/issues. Best, Sheng