choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
182 stars 86 forks source link

Genotype Info from Target #127

Closed and02709 closed 5 years ago

and02709 commented 5 years ago

Hi, I have a question about the "Loading Genotype info from target" section that appears in the log file.

I am using a GWAS base that has 9733 variants observed in the base file, but several are ambiguous and are excluded. This leaves 8234 total variants included from the base file.

In the following section where it lists the information under the header "Loading Genotype info from target", it lists that I have 588 people (289 male, 299 female) observed 346 founders included

568 variants included. (*** this number is my concern, this is much smaller than the variant number in the base file)

When I examined similar output for the TOY data set provided, it reads, 91062 variants observed in base file 88836 total variants included from base file

Under Loading Genotype info from target it lists 2000 people (1024 male, 976 female) observed 2000 founders included

88836 variants included. (*** this is exactly the number in the base file)

Ultimately, I am not sure why so few of my genetic variants from base file appear to be included. In your experience, does this suggest a problem that I should consider, or is easily explained somehow? I think I am missing something important - I guess I'm asking for advice.

Thank you.

Sincerely, Michael

choishingwan commented 5 years ago

You need to check if the same SNPs are found in both the target and base. You can check that by examining their SNP ID. There are two issues here: You base is relatively small, usually we will have many more SNPs in the base. Did you only include significant SNPs? Second, based on the low number of SNPs found in target, it seems like the SNP ID in your target doesn't match those in the base. You can consider using a denser base or impute your target

Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

On Sat, Jul 27, 2019 at 2:47 PM and02709 notifications@github.com wrote:

Hi, I have a question about the "Loading Genotype info from target" section that appears in the log file.

I am using a GWAS base that has 9733 variants observed in the base file, but several are ambiguous and are excluded. This leaves 8234 total variants included from the base file.

In the following section where it lists the information under the header "Loading Genotype info from target", it lists that I have 588 people (289 male, 299 female) observed 346 founders included

568 variants included. (*** this number is my concern, this is much smaller than the variant number in the base file)

When I examined similar output for the TOY data set provided, it reads, 91062 variants observed in base file 88836 total variants included from base file

Under Loading Genotype info from target it lists 2000 people (1024 male, 976 female) observed 2000 founders included

88836 variants included. (*** this is exactly the number in the base file)

Ultimately, I am not sure why so few of my genetic variants from base file appear to be included. In your experience, does this suggest a problem that I should consider, or is easily explained somehow? I think I am missing something important - I guess I'm asking for advice.

Thank you.

Sincerely, Michael

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/127?email_source=notifications&email_token=AAJTRYWO4W7UY7DBY2XASX3QBSJ2HA5CNFSM4IHKIZYKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HB3SO3Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJTRYQQNSVR37FH3B2Q7D3QBSJ2HANCNFSM4IHKIZYA .

misrak commented 5 years ago

Hello!

I had the same question actually about the difference. I am using a base and target file with both same common SNPs and both the files have identical SNP ID. Is there any other reason why the difference would still be there?

Example:

Start processing common631SNP_ADD.assoc Reading 100.00% Base file: ../association_results/common631SNP_ADD.assoc.logistic 631 variant(s) observed in base file, with: 13 variant(s) located on haploid chromosome 26 NA stat/p-value observed 618 total variant(s) included from base file

Loading Genotype info from target

181 people (119 male(s), 62 female(s)) observed 181 founder(s) included

533 variant(s) included

Thank you so much. Kaalindi

choishingwan commented 5 years ago

Can you try running with version 2.2.5 and send me the log? That should give me more information.

misrak commented 5 years ago

new.log

I hope this helps.

choishingwan commented 5 years ago

Your target file doesn't contain 631 variants. There are only 533 variants in your target.

misrak commented 5 years ago

No, I have checked it multiple times it contains 631 snps. you can it yourself. would you also need covariate file or this will do?

choishingwan commented 5 years ago

Now I see. As your base only contain one allele (A1), ambiguous SNP removal cannot be performed in the base data. When we go into target data loading, we observed there are ambiguous SNPs, which we then removed (a total of 76). Due to some problem on my side, this message wasn't reported, thus lead to the confusion. This problem should be solved in future release

Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

choishingwan commented 5 years ago

Side note: I have deleted the link to your files so that others cannot download it (hopefully). It might be best if you send your data to me via email next time (preferrably with consent from your PI)

misrak commented 5 years ago

Thank you so much for your help that explains a lot.

Thank you for deleting the files and I had asked for consent before sharing it with you. :)