Closed BogyeomKim closed 2 years ago
Hi Bogyeom- the program only uses ATGC alleles in prediction but it automatically removes indels. I think the error is likely caused by format issues for the indels in the summary statistics file. Do they also have 5 columns separated by the same delimiter?
Yes, I think so. They are separated by 'tab or space' but there are some
other special characters. Is there any possibility to occur Type errors
because of special characters?
rs1835369049 GA G 1.03624 0.1732
rs1588375638 C CAAAAAAAAAAAA+1 0.98462 0.2982
rs1238492479 CAA CA 1.01167 0.7432
rs551065795 T TGA 1.14191 0.06265
rs1201441772 C CAAAA 0.97971 0.1843
rs138725384 CATCT CATCTATCT 0.99332 0.6796
rs1831598329 A G 1.00150 0.9704
rs542437283 T C 1.03873 0.3628
rs573130529 T C 0.96271 0.3628
rs1005563844 CTT CT 0.99263 0.66
rs377432397 A ACACCT 0.96127 0.1399
rs1832552750 GTTTTTT G 0.99581 0.8282
rs1363260291 T TACACACACACAC 1.02327 0.7241
rs560859299 T G 1.11405 0.08013
rs1564281434 A
2021년 8월 26일 (목) 오후 11:53, Tian Ge @.***>님이 작성:
Hi Bogyeom- the program only uses ATGC alleles in prediction but it automatically removes indels. I think the error is likely caused by format issues for the indels in the summary statistics file. Do they also have 5 columns separated by the same delimiter?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/getian107/PRScsx/issues/12#issuecomment-906483589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQD26XGVW5ZUNMSZQ2BLCP3T6ZINZANCNFSM5C2QNZ7Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
Yes I think these special characters might be the cause. If it's not too annoying you can remove indels from the summary stats before running the algorithm. This wouldn't reduce prediction power since PRS-CS(x) only uses SNPs for prediction.
Dear Dr. Tian,
I have encountered the following issue while running the PRS-csx:
`--ref_dir=/work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/ld_ref --bim_prefix=../ABCD_genotype2021/ABCD_QCed_2021_PCair_8620+1579_PRScs/ABCD_QCed_2021_PCair_8620+1579_PRScs_SNPrsid_final --sst_file=['../ABCD_summarystats/final_ASD_forPRScsx.txt'] --a=1 --b=0.5 --phi=1.0 --n_gwas=[46350] --pop=['EUR'] --n_iter=1000 --n_burnin=500 --thin=5 --out_dir=/work2/08170/amyk01/stampede2/ABCD_PRScsx/ASD/prscsx_output --out_name=ABCD_ASD_csx --chrom=['22'] --meta=FALSE --seed=None
process chromosome 22
... parse reference file: /work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/ld_ref/snpinfo_mult_1kg_hm3 ... ... 18944 SNPs on chromosome 22 read from /work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/ld_ref/snpinfo_mult_1kg_hm3 ... ... parse bim file: ../ABCD_genotype2021/ABCD_QCed_2021_PCair_8620+1579_PRScs/ABCD_QCed_2021_PCair_8620+1579_PRScs_SNPrsid_final.bim ... ... 154308 SNPs on chromosome 22 read from ../ABCD_genotype2021/ABCD_QCed_2021_PCair_8620+1579_PRScs/ABCD_QCed_2021_PCair_8620+1579_PRScs_SNPrsid_final.bim ... ... parse EUR sumstats file: ../ABCD_summarystats/final_ASD_forPRScsx.txt ... Traceback (most recent call last): File "/work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/PRScsx.py", line 204, in
main()
File "/work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/PRScsx.py", line 187, in main
sst_dict[pp] = parse_genet.parse_sumstats(ref_dict, vld_dict, param_dict['sst_file'][pp], param_dict['pop'][pp], param_dict['n_gwas'][pp])
File "/work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/parse_genet.py", line 73, in parse_sumstats
if ll[1] in ATGC and ll[2] in ATGC:
IndexError: list index out of range`
The code I ran was as below:
python3 /work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/PRScsx.py --ref_dir=/work2/07939/tg872382/stampede2/connectome/stampede2/PRScsx/ld_ref --bim_prefix=../ABCD_genotype2021/ABCD_QCed_2021_PCair_8620+1579_PRScs/ABCD_QCed_2021_PCair_8620+1579_PRScs_SNPrsid_final --sst_file=../ABCD_summarystats/final_ASD_forPRScsx.txt --n_gwas=46350 --pop=EUR --chrom=22 --phi=1 --out_dir=/work2/08170/amyk01/stampede2/ABCD_PRScsx/ASD/prscsx_output --out_name=ABCD_ASD_csx
I think it might come from indel frequencies in the summary stats file. I found my summary stat file contained some SNP with indel A1, A2 (e.g., A1 = ATG, ATTTT). I also found that after I removed indel SNP, the code worked.
I just wonder parse_genet.py could only deal with 'A', 'T', 'G', 'C' as A1 or A2 because about 10% SNP was lost when I excluded indel SNP.
Best regards, Bogyeom Kim.