choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
185 stars 87 forks source link

--chr-id questions #272

Closed dpelegri closed 3 years ago

dpelegri commented 3 years ago

Hello choishingwan,

I have a question, we are running PRSice v2 with the following datasets:

1) base: 
snpID: chr:pos:REF:ALT
Example:
chr | pos | variant_id | ref | alt -- | -- | -- | -- | -- 1 | 100000012 | 1:100000012:G:T | G | T 1 | 100000827 | 1:100000827:C:T | C | T

2) target: 
PLINK file
snpID: chr:pos:A1:A2
Example:
chr     snpID              Mb     pos      A1      A2 
1       1:11008:C:G     0       11008   G       C
1       1:11012:C:G     0       11012   G       C
1       1:13110:G:A     0       13110   A       G
Usually in PLINK being A1=minor allele and A2=major allele

In order to match the snpID of the two datasets we are planning to use the --chr-id argument.
Thus the code would be:
--a1 alt
--a2 ref
--snp variant_id
--chr-id  c:L-Bad 
Given that the snpID in the base GWAS is REF:ALT, is the --chr-id argument correct as "Ba" (instead of aB)? 
Will this argument create a new snpID in the target?
Besides creating a new snpID in the target, does the argument modify A1 and A2 in the PLINK file? Or the identification of A1/A2 vs ref/alt is done independenly of this argument?

If we instead run the code without the --snp argument:
--a1 alt
--a2 ref
--chr-id  c:L-Bad 
Would, then, the --chr-id argument create a new ID both in the base and target GWAS?
In this case, can we include run either --chr-id c:L-Bad or --chr-id c:L-aBd?
  

Thank you,

choishingwan commented 3 years ago

Hi,

--chr-id is one of those function that I implemented on a whim and are therefore have a lot of undefined behavior.

To answer your question, --chr-id basically re-label your SNP in the bim file with the corresponding parameter unless you didn't provide the --snp parameter and the default doesn't pick up the SNP column, then we will also construct the chr id for your base data. To understand the parameter, --chr-id c:L-bad is translated as <chromosome>:-d I guess you don't really want the d, so you would instead do something like --chr-id c:L-ba

The problem with using chr id is that it highly limit PRSice's ability to do flipping as we are now mapping the SNP IDs w.r.t. their allele. To avoid that, a better chr id will be --chr-id c:L which doesn't include the alleles, in which case, PRSice can tried to do flipping. The only annoying problem that you will have to look out for will be to remove SNPs fall within the same location.

Hope this help

Sam

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.