choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
180 stars 85 forks source link

Unable to remove duplicated SNP with --extract PRSice.valid #333

Closed sakuramodokich closed 1 year ago

sakuramodokich commented 1 year ago

The first run generated a PRSice.valid file automatically.

18803059 variant(s) observed in base file, with: 
2635603 ambiguous variant(s) excluded 

Error: A total of 83395 duplicated SNP ID detected out of 
       16073292 input SNPs! Valid SNP ID (post --extract / 
       --exclude, non-duplicated SNPs) stored at 
       PRSice.valid. You can avoid this error by using 
       --extract PRSice.valid

Tried using --extract PRSice.valid to remove duplicate SNPs, but the second run still reported 3 duplicates.

18803059 variant(s) observed in base file, with: 
2796065 variant(s) excluded based on user input 
16006994 total variant(s) included from base file
...
Error: A total of 3 duplicated SNP ID detected out of 
       5080168 input SNPs! Valid SNP ID (post --extract / 
       --exclude, non-duplicated SNPs) stored at 
       PRSice.valid. You can avoid this error by using 
       --extract PRSice.valid

Kept using --extract on the third run, but it did not resolve the duplicate SNP issue.

18803059 variant(s) observed in base file, with: 
13722893 variant(s) excluded based on user input 
5080166 total variant(s) included from base file
...
Error: A total of 1 duplicated SNP ID detected out of 
       5080166 input SNPs! Valid SNP ID (post --extract / 
       --exclude, non-duplicated SNPs) stored at 
       PRSice.valid. You can avoid this error by using 
       --extract PRSice.valid
sakuramodokich commented 1 year ago

@choishingwan I have encountered this issue with several base files. I would greatly appreciate it if you could please help me resolve this problem when you have a moment.