eldronzhou / SDPR

A fast and robust Bayesian nonparametric method for prediction of complex traits using GWAS summary statistics
GNU General Public License v3.0
10 stars 1 forks source link

Cannot use -opt_llk 2 #10

Open PEWilliamZhou opened 1 year ago

PEWilliamZhou commented 1 year ago

Dear author,

I am using deCODE summary statistics along with your LD reference. After reading your article, I understand that I should use opt_llk 2. However, I encounter an error only when using -opt_llk 2 (when not using -opt_llk 2, it runs normally). I am not sure what the "ARRAY" column refers to:

terminate called after throwing an instance of 'std::runtime_error' what(): Error: cannot find ARRAY column. /var/spool/slurm/slurmd/job18466048/slurm_script: line 18: 181652 Aborted ./SDPR -mcmc -ss /home/pz284/rds/hpc-work/dissertation/data/deCODE/${trait_name}.txt -ref_dir /home/pz284/rds/hpc-work/dissertation/data/SDPRLD/ref -chr ${chr_num} -opt_llk 2 -out /home/pz284/rds/hpc-work/dissertation/SDPRoutput/${trait_name}/${trait_name}_chr${chr_num}_SDPRopt2_out.txt -N ${N_max} -n_threads 4

In addition to this question, I also found that I got estimated h2 >1 (sometimes reaching 4). Could this be because the external LD reference is being used instead of the LD from the summary statistics samples?

Thank you for your assistance.

PEWilliamZhou commented 1 year ago

Also I found this bug:

terminate called after throwing an instance of 'std::out_of_range' what(): stod /var/spool/slurm/slurmd/job18464940/slurm_script: line 18: 130632 Aborted

I confirmed that my A1 and A0 only contain a single A, T, C, or G. So why does it happen?

eldronzhou commented 1 year ago

Dear William,

Thank you for your interest in SDPR and sorry for the late reply as I did not receive the auto-email from GitHub for this issue. Regarding your question, -opt llk2 should be used if your summary statistics are from meta-analysis where part of cohort was genotyped on a specialized array. This is not a common case and I have only encountered once when analyzing metabochip for lipid GWAS. Therefore, I think you should use -opt llk1 unless it does not work.

The ARRAY column means "The summary statistics should include another column ARRAY indicating whether SNPs were genotyped on array 1 (coded as 1), array 2 (coded as 2), or both arrays (coded as 0)" as discussed in the manual (http://htmlpreview.github.io/?https://github.com/eldronzhou/SDPR/blob/main/doc/Manual.html).

For the problem of std::out_of_range, it was probably due to the small p value in the summary statistics and I have fixed it some time ago (https://github.com/eldronzhou/SDPR/issues/6). Could you try the latest version of SDPR and use the format of summary statistics as: SNP A1 A2 Z?

The heritability can be used for diagnosis of convergence problem. If h2 > 1, then something must be wrong and the prediction performance would not make sense. Mismatch between summary statistics and reference could be one reason. If you have encountered this problem and still are interested in using SDPR, could you share the log file or the summary statistics so that I can take a look?