choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
185 stars 87 forks source link

not filtered by --maf and --geno in older version? #279

Closed lgarvert closed 2 years ago

lgarvert commented 3 years ago

Hi,

I calculated a PRS with two different versions of PRSice: PRSice 2.2.5 and the latest PRSice 2.3.5. The input files and the flags are the same except that I extracted ~7.000 fewer SNPs when running the latest version.

However, when comparing the two log files I realised a difference after --maf and --geno filtering. The new file has 116.576 variants included before filtering. It filters by --maf and --geno and then has 75.342 variants included before clumping. The old file on the other hand has 123.865 variants included before filtering. It also identifies SNPs that should be excluded according to --maf and --geno. But then the log file still records 123.865 variants included before clumping. And the number of variants after clumping is also a lot higher than in the newest version (4.233 vs 2.968). So did the old script not actually exclude the SNPs based on --maf and --geno?

new_ukb_imp_chr22_v3.log old_ukb_imp_chr22_v3.log

Thank you, Linda

choishingwan commented 3 years ago

If I remember correctly, there are multiple error in the maf and geno calculation for the older version (not considering founder status etc), and that caused some of the error. In addition, what you observed might also be true, that PRSice did not really use the exclusion, though it is hard to tell now that it was more than 2 years ago. You can check if you get the same number of SNPs after filtering using PLINK (can use --print-snp to check). We are currently working very hard to generate a more robust version of PRSice. Will see how that goes

Sam

lgarvert commented 3 years ago

Hi Sam,

sorry I have one more question regarding the filtering. I calculated another PRS with the latest PRSice version. I used the --geno flag but the log file does not mention any filtering based on missingness. Is that because it did not perform this step or because there were no variants that had to be excluded based on genotype missingness? If it did perform the test, is it possible to add someting along the line of "0 variants excluded based on genotype missingness" to the log file in the next version?

Thank you! Linda