Closed HelenYSLin closed 1 day ago
I found that this issue was mentioned before in #40 I reinstalled PRScs and the ukbb_eur reference panel from this GitHub, but the issue persists...
Hi - Could you also double check that your summary stats didn't have missing values? What's the difference between the real bim and the manually created bim? Did they lead to the same number of harmonized SNPs?
Sure! I've checked that my summary stats don't have missing values. The real bim actually has nothing in common with the bim I created manually. The real bim I used is from a previous run of PRScs that worked for a different purpose. The goal of testing the two bim files was just to ensure there weren't any issues with my sumstats or bim. I believe the real bim contains hm3 SNPs (N=89,019 variants in chr1), while the one I created was made from array SNPs without imputation (N=76,270 variants in chr1)
I see - This does not rule out the possibility that the summary stats may have issues though as the intersection between the two bim files and the summary stats could be different. Could you also check if there was any duplicated SNPs in the summary stats?
I've checked, and there are no duplicates in the SNP column either...
If you summary stats and bim files are sharable, I'm happy to take a look.
I fixed the extremely small p values in my sumstats again with a modified code and it worked! Thank you for your always prompt replies though
I encountered this error when running PRScs-auto during the MCMC step for chromosome 1. The previous steps seemed correct, as it printed
22925 common SNPs in the reference, sumstats, and validation set
Traceback (most recent call last): File "/home/jupyter/workspaces/prswithwgsvsarraydata/PRScs/PRScs.py", line 118, in <module> main() File "/home/jupyter/workspaces/prswithwgsvsarraydata/PRScs/PRScs.py", line 110, in main mcmc_gtb.mcmc(param_dict['a'], param_dict['b'], param_dict['phi'], sst_dict, param_dict['n_gwas'], ld_blk, blk_size, File "/home/jupyter/workspaces/prswithwgsvsarraydata/PRScs/mcmc_gtb.py", line 61, in mcmc beta_tmp = linalg.solve_triangular(dinvt_chol, beta_mrg[idx_blk], trans='T') + np.sqrt(sigma/n)*random.randn(len(idx_blk),1) File "/opt/conda/lib/python3.10/site-packages/scipy/linalg/_basic.py", line 335, in solve_triangular b1 = _asarray_validated(b, check_finite=check_finite) File "/opt/conda/lib/python3.10/site-packages/scipy/_lib/_util.py", line 240, in _asarray_validated a = toarray(a) File "/opt/conda/lib/python3.10/site-packages/numpy/lib/function_base.py", line 627, in asarray_chkfinite raise ValueError( ValueError: array must not contain infs or NaNs
I have ensured that both my sumstats and bim files do not contain NAs or infinite values. I also set extreme P-values in the sumstats to non-zero values, but none of these steps fixed the issue.
For additional context, I manually created the bim file instead of using Plink. I benchmarked this "fake" bim file against a real bim file, and the real file worked for both chromosome 1 and chromosome 22. However, my fake bim file worked for chromosome 22 but not chromosome 1. I'm not sure if this means my sumstats are fine or if there's something I missed.
Another point of context: I noticed that the reference panel is in hg19, but I annotated the dbSNP rsID in my sumstats and bim files using hg38. Will this matter, even though the rsID should not be affected by the genome build?
Here is the head of my sumstats: SNP A1 A2 BETA P rs139221807 G A 6.16400e-02 2.92079e-01 rs3131972 G A 6.52400e-04 8.53080e-01 rs3115860 A C 6.94400e-04 8.54044e-01
and head of my bim file: 1 rs539322794 0 49554 G A 1 rs147538909 0 115746 T C 1 rs369986014 0 801883 A G
Any insights on this would be greatly appreciated!