choishingwan / PRSice

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores
http://prsice.info
GNU General Public License v3.0
182 stars 86 forks source link

incorrect snps selection #149

Closed sarasaezALS closed 5 years ago

sarasaezALS commented 5 years ago

Hi Sam, I have noticed that my pathway analysis RESULTS using the last version and p-competitive was a little weird. I have run a list of genes using only the known GWAS hits (only 6) and the software ignore two of the genes (it didn't take any SNP from them for the PRS analysis). the genes are C9orf72 and C21orf2. The binary files and target file are OK. The problem is the software doesn't take them for the analysis. This is a small fragment from the .snp file 9 rs10511817 27454173 0.03901 Y Y N 9 rs7873548 27467763 2.803e-06 Y Y N 9 rs61349511 27468262 0.883 Y Y N 9 rs118052933 27490374 0.398 Y Y N 9 rs1977661 27502986 0.006554 Y Y N 9 rs76622200 27513089 0.2413 Y Y N 9 rs3849943 27543382 3.985e-19 Y Y N 9 rs10967977 27545078 0.001501 Y Y N 9 rs62538126 27556831 0.4666 Y Y N 9 rs34366576 27562078 0.0236 Y Y N 9 rs149101200 27578349 0.07657 Y Y N 9 rs10757670 27584899 0.0007152 Y Y N 9 rs702231 27588731 0.003274 Y Y N 9 rs34460171 27594491 3.645e-06 Y Y N 9 rs10968001 27600645 0.455 Y Y N 9 rs615849 27606780 0.585 Y Y N

choishingwan commented 5 years ago

Did you use a GTF file and a MSigDB file? If so, can you confirm that the GTF file also contain the gene?

sarasaezALS commented 5 years ago

Yes, it seems to be OK in both files. I used the same GFT file before with the old version and I never had this problem. Do you know what is wrong?

From: "Shing Wan Choi" notifications@github.com<mailto:notifications@github.com> Date: Sunday, September 8, 2019 at 10:43:14 AM To: "choishingwan/PRSice" PRSice@noreply.github.com<mailto:PRSice@noreply.github.com> Cc: "Saez, Sara (NIH/NIA/IRP) [F]" sara.saez@nih.gov<mailto:sara.saez@nih.gov>, "Author" author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [choishingwan/PRSice] incorrect snps selection (#149)

Did you use a GTF file and a MSigDB file? If so, can you confirm that the GTF file also contain the gene?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/choishingwan/PRSice/issues/149?email_source=notifications&email_token=AL5GPLYI6EBULUM36DZARU3QIUFPZA5CNFSM4IUS74B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6FRN7Q#issuecomment-529209086, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AL5GPL5BU225MI4T6IR3ATLQIUFPZANCNFSM4IUS74BQ.

choishingwan commented 5 years ago

Do you mind sending me the GTF, the MSigDB and the bim file so that I can have a look?

Thanks

sarasaezALS commented 5 years ago

The GTF FILE is compressed, but still it is big file and I cannot upload it using Github.

sarasaezALS commented 5 years ago

and the same with the big file

sarasaezALS commented 5 years ago

bim file I meant do you know other way to send the files? thanks

choishingwan commented 5 years ago

Maybe you can send them directly to my email

Though for the GTF, I only need the lines correspond to the gene of interest. So maybe you can send me the part consist of C9orf72, C21orf2

sarasaezALS commented 5 years ago

Ok, I have send it to you gmail.

choishingwan commented 5 years ago

The most likely reason is that SNPs in C9orf72 were being clumped away by SNPs within the same pathway. --print-snp generate the file containing the SNP membership after clumping is performed and PRSet clumping is performed across all SNPs within the same pathway. It is therefore possible for the resulting *.snp file to not contain any SNPs within genes in a pathway.

You can check this by doing a --no-clump and see if the generated SNP set contain SNPs within the gene of interest (alternatively, you can generate a gene set containing only the gene of interest)

sarasaezALS commented 5 years ago

Hi Sam, thanks for the feedback. I already did it. I generate a list of 5 genes (including c9orf72 as the only gene in Chr9 and c21orf2 as the only gene in Chr21). The .summary file did not contained any SNP from chr9 or crh21.

choishingwan commented 5 years ago

So you did:

  1. Using --no-clump to see if any SNPs are found in c9orf72?
  2. Generate a gene set containing only C9orf72?

And none of those situation are there any SNPs found in those genes? For 2, could you please check if there is a *.xregion generated? If all SNPs on the C9orf72 were not found in the base, or have different allele encoding, or are ambiguous, then that might also contribute to the missingness of the SNPs on the target gene.

sarasaezALS commented 5 years ago

I tried with a list of C9orf2 and another gene only. I will do your recommendation and let you know. also, the analysis using PRset is running really slow...it takes days. do you know how could I increase the speed?

choishingwan commented 5 years ago

I'm currently working on the speed side. With some trick in linear algebra, we are hoping to significantly increase the speed of PRSet permutation (maybe by 25%?) but still, given the number of operation involves, we won't be too surprise if PRSet still takes a long time to run

sarasaezALS commented 5 years ago

OK. I didn't find any .xregion but a .mismatch file. it is the one I should look at?

choishingwan commented 5 years ago

The xregion file should contain sets that doesn’t contain any SNPs. As that isn’t generated, it suggested that all your sets should have at least one SNP. Did you find any SNP within your C9orf27 only gene set?

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

sarasaezALS commented 5 years ago

not even one...

sarasaezALS commented 5 years ago

the thing is the C9orf72 is the most important GWAS hit. it is not include in the analysis, I am missing a lot of information. In the previous version (without the PRSet) it was included. Now, I am using the same files and I cannot find any snp. the same happen with c21orf2. I did the analysis using ENTREZ nomenclature just in case. The results were the same

choishingwan commented 5 years ago

Did you see the C9orf27 gene set in your print snp file? And is it full of N with 0 Y?

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

sarasaezALS commented 5 years ago

something like this 9 rs2756906 27223215 0.0845 Y Y N 9 rs7024828 27226902 0.6857 Y Y N 9 rs149140408 27235840 0.02657 Y Y N 9 rs62541821 27247976 0.5231 Y Y N 9 rs117741650 27258310 0.1721 Y Y N 9 rs138877558 27268932 0.1437 Y Y N 9 rs138877450 27299296 0.1223 Y Y N 9 rs72721164 27300439 0.6681 Y Y N 9 rs79596730 27304295 0.003768 Y Y N 9 rs116882138 27313557 0.5706 Y Y N 9 rs1555453 27326780 0.2174 Y Y N 9 rs1984007 27329622 0.2141 Y Y N 9 rs1984008 27329792 0.01028 Y Y N 9 rs3739530 27330514 0.3543 Y Y N 9 rs148157353 27335426 0.002498 Y Y N 9 rs73643189 27336814 0.6924 Y Y N 9 rs10812570 27340967 0.3265 Y Y N 9 rs12553110 27346660 0.1398 Y Y N 9 rs71510489 27352998 0.805 Y Y N 9 rs1330921 27367278 0.5752 Y Y N 9 rs10812576 27373009 0.863 Y Y N 9 rs56799759 27373173 0.005481 Y Y N 9 rs79883193 27374062 0.5447 Y Y N 9 rs7035290 27422899 0.1985 Y Y N 9 rs75999974 27436645 0.04637 Y Y N 9 rs117125597 27446926 0.03887 Y Y N 9 rs10511817 27454173 0.03901 Y Y N 9 rs7873548 27467763 2.803e-06 Y Y N 9 rs61349511 27468262 0.883 Y Y N 9 rs118052933 27490374 0.398 Y Y N 9 rs1977661 27502986 0.006554 Y Y N 9 rs76622200 27513089 0.2413 Y Y N 9 rs3849943 27543382 3.985e-19 Y Y N 9 rs10967977 27545078 0.001501 Y Y N 9 rs62538126 27556831 0.4666 Y Y N 9 rs34366576 27562078 0.0236 Y Y N 9 rs149101200 27578349 0.07657 Y Y N 9 rs10757670 27584899 0.0007152 Y Y N 9 rs702231 27588731 0.003274 Y Y N 9 rs34460171 27594491 3.645e-06 Y Y N 9 rs10968001 27600645 0.455 Y Y N 9 rs615849 27606780 0.585 Y Y N 9 rs17779794 27610116 0.1377 Y Y N 9 rs483752 27612918 0.3342 Y Y N 9 rs10812623 27621214 0.282 Y Y N 9 rs80083405 27644539 0.4992 Y Y N 9 rs77273648 27670186 0.7239 Y Y N 9 rs11789832 27675210 0.003284 Y Y N 9 rs76972116 27679982 0.009405 Y Y N 9 rs7850603 27680174 0.4819 Y Y N 9 rs80295448 27698700 0.581 Y Y N 9 rs1411377 27707711 0.6167 Y Y N 9 rs1411375 27713723 0.09071 Y Y N 9 rs79514749 27717328 0.01433 Y Y N 9 rs12554715 27720498 0.291 Y Y N 9 rs2026146 27723858 0.04173 Y Y N 9 rs13286192 27724116 0.1738 Y Y N 9 rs67568248 27724342 0.5739 Y Y N 9 rs114463477 27736493 0.03762 Y Y N 9 rs12551662 27738341 0.8425 Y Y N 9 rs76626916 27745843 0.09827 Y Y N 9 rs10124726 27752020 0.9679 Y Y N 9 rs77422401 27790156 0.1182 Y Y N 9 rs10114706 27802678 0.07021 Y Y N 9 rs188519351 27804085 0.2433 Y Y N 9 rs34746802 27816119 0.1037 Y Y N 9 rs35764562 27829802 0.1026 Y Y N 9 rs77833053 27835972 0.06233 Y Y N 9 rs73643464 27837751 0.3302 Y Y N 9 rs71512411 27848202 0.8113 Y Y N 9 rs10812676 27858481 0.457 Y Y N 9 rs13293665 27868242 0.06784 Y Y N 9 rs34176491 27889643 0.5088 Y Y N 9 rs139600448 27895064 0.2905 Y Y N 9 rs74996466 27898109 0.6825 Y Y N 9 rs76187699 27915384 0.6573 Y Y N 9 rs12156644 27941292 0.06206 Y Y N 9 rs7023531 27958983 0.4385 Y Y N 9 rs117329590 27969292 0.4373 Y Y N

choishingwan commented 5 years ago

Do you have the header of the file? And assuming the last column is the C9orf27, could you do

awk ‘$7==Y’ xxx.snp and check what’s the output for that?

On Mon, 9 Sep 2019 at 3:24 PM, sarasaezALS notifications@github.com wrote:

something like this 9 rs2756906 27223215 0.0845 Y Y N 9 rs7024828 27226902 0.6857 Y Y N 9 rs149140408 27235840 0.02657 Y Y N 9 rs62541821 27247976 0.5231 Y Y N 9 rs117741650 27258310 0.1721 Y Y N 9 rs138877558 27268932 0.1437 Y Y N 9 rs138877450 27299296 0.1223 Y Y N 9 rs72721164 27300439 0.6681 Y Y N 9 rs79596730 27304295 0.003768 Y Y N 9 rs116882138 27313557 0.5706 Y Y N 9 rs1555453 27326780 0.2174 Y Y N 9 rs1984007 27329622 0.2141 Y Y N 9 rs1984008 27329792 0.01028 Y Y N 9 rs3739530 27330514 0.3543 Y Y N 9 rs148157353 27335426 0.002498 Y Y N 9 rs73643189 27336814 0.6924 Y Y N 9 rs10812570 27340967 0.3265 Y Y N 9 rs12553110 27346660 0.1398 Y Y N 9 rs71510489 27352998 0.805 Y Y N 9 rs1330921 27367278 0.5752 Y Y N 9 rs10812576 27373009 0.863 Y Y N 9 rs56799759 27373173 0.005481 Y Y N 9 rs79883193 27374062 0.5447 Y Y N 9 rs7035290 27422899 0.1985 Y Y N 9 rs75999974 27436645 0.04637 Y Y N 9 rs117125597 27446926 0.03887 Y Y N 9 rs10511817 27454173 0.03901 Y Y N 9 rs7873548 27467763 2.803e-06 Y Y N 9 rs61349511 27468262 0.883 Y Y N 9 rs118052933 27490374 0.398 Y Y N 9 rs1977661 27502986 0.006554 Y Y N 9 rs76622200 27513089 0.2413 Y Y N 9 rs3849943 27543382 3.985e-19 Y Y N 9 rs10967977 27545078 0.001501 Y Y N 9 rs62538126 27556831 0.4666 Y Y N 9 rs34366576 27562078 0.0236 Y Y N 9 rs149101200 27578349 0.07657 Y Y N 9 rs10757670 27584899 0.0007152 Y Y N 9 rs702231 27588731 0.003274 Y Y N 9 rs34460171 27594491 3.645e-06 Y Y N 9 rs10968001 27600645 0.455 Y Y N 9 rs615849 27606780 0.585 Y Y N 9 rs17779794 27610116 0.1377 Y Y N 9 rs483752 27612918 0.3342 Y Y N 9 rs10812623 27621214 0.282 Y Y N 9 rs80083405 27644539 0.4992 Y Y N 9 rs77273648 27670186 0.7239 Y Y N 9 rs11789832 27675210 0.003284 Y Y N 9 rs76972116 27679982 0.009405 Y Y N 9 rs7850603 27680174 0.4819 Y Y N 9 rs80295448 27698700 0.581 Y Y N 9 rs1411377 27707711 0.6167 Y Y N 9 rs1411375 27713723 0.09071 Y Y N 9 rs79514749 27717328 0.01433 Y Y N 9 rs12554715 27720498 0.291 Y Y N 9 rs2026146 27723858 0.04173 Y Y N 9 rs13286192 27724116 0.1738 Y Y N 9 rs67568248 27724342 0.5739 Y Y N 9 rs114463477 27736493 0.03762 Y Y N 9 rs12551662 27738341 0.8425 Y Y N 9 rs76626916 27745843 0.09827 Y Y N 9 rs10124726 27752020 0.9679 Y Y N 9 rs77422401 27790156 0.1182 Y Y N 9 rs10114706 27802678 0.07021 Y Y N 9 rs188519351 27804085 0.2433 Y Y N 9 rs34746802 27816119 0.1037 Y Y N 9 rs35764562 27829802 0.1026 Y Y N 9 rs77833053 27835972 0.06233 Y Y N 9 rs73643464 27837751 0.3302 Y Y N 9 rs71512411 27848202 0.8113 Y Y N 9 rs10812676 27858481 0.457 Y Y N 9 rs13293665 27868242 0.06784 Y Y N 9 rs34176491 27889643 0.5088 Y Y N 9 rs139600448 27895064 0.2905 Y Y N 9 rs74996466 27898109 0.6825 Y Y N 9 rs76187699 27915384 0.6573 Y Y N 9 rs12156644 27941292 0.06206 Y Y N 9 rs7023531 27958983 0.4385 Y Y N 9 rs117329590 27969292 0.4373 Y Y N

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRSice/issues/149?email_source=notifications&email_token=AAJTRYV6QXVANQ2TNMF76Z3QI2PGTA5CNFSM4IUS74B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IX22Y#issuecomment-529628523, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJTRYVXS55JK6R4UMGZKMLQI2PGTANCNFSM4IUS74BQ .

-- Dr Shing Wan Choi Postdoctoral Fellow Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

sarasaezALS commented 5 years ago

OK, I did it and it is nothing...NO Y

sarasaezALS commented 5 years ago

This is the header by the way: CHR SNP BP P Base Background GWAShitsentrez

sarasaezALS commented 5 years ago

Hi Sam, when I run the analysis including just c9orf72, it says this: Error: None of the gene sets contain any SNP(s) after clumping. Have you provided the correct input? E.g. GMT file containing Entrez ID with GTF files that uses the Ensembl gene ID? so I guess this is the problem.

so now the question would be how to include SNPs from this gene in the analysis. I am redoing it using --wind-5 3kb and --wind-3 15kb. But besides that, I don't know how to include SNPs from this gene without compromising too much the clumping criteria

choishingwan commented 5 years ago

I see, that explains it. What's likely the problem here is likely that there are no SNPs physically reside within the C9orf27 region from your summary statistic file. Using the --wind-3 and --wind-5 options should help. Alternatively, you might be able to explore a method called p-value imputation. e.g. https://www.ncbi.nlm.nih.gov/pubmed/26306642 (I haven't follow literature on this for sometime, this might not be the latest). This should help you to increase the coverage for your base file, and hopefully allow some intersection with the target (I've checked that some SNPs in your bim does fall within the C9orf27 region) and the C9orf27 gene.

sarasaezALS commented 5 years ago

Thanks for your help. still I don't get why it is not taking this two post clumping snps from the .snp file. they are include in the gene but they are not include in the analysis. 9 rs62538126 27556831 0.4666 Y Y N 9 rs34366576 27562078 0.0236 Y Y N

choishingwan commented 5 years ago

Is the last set the set containing only the C9orf27 gene?

I think I lost track of the whole situation here, sorry about that, could you please lay out the current steps and results? So if you do the following steps

  1. Construct gene set only containing C9orf27
  2. Running PRSet with only this gene set provided and with the same GTF and base file
  3. Don't use any --wind-x options
  4. Does the output snp file contains any SNPs within the C9orf27 region? Or do you have an error stating all gene set does not contain any SNP

When you ask why those SNPs are including in the gene but not within the set:

  1. Does your gene set of interest contain genes other than C9orf27? If so, then the SNPs within C9orf27 might have been clumped out. If you run PRSet with the same geneset, but using --no-clump, are those SNPs considered to be within the gene?
sarasaezALS commented 5 years ago

Is the last set the set containing only the C9orf27 gene? _YES. in this case, it doesn't generate any .summary file and the .log file says: Error: None of the gene sets contain any SNP(s) after clumping. Have you provided the correct input? E.g. GMT file containing Entrez ID with GTF files that uses the Ensembl gene ID?

I think I lost track of the whole situation here, sorry about that, could you please lay out the current steps and results? So if you do the following steps

Construct gene set only containing C9orf27 Running PRSet with only this gene set provided and with the same GTF and base file Don't use any --wind-x options Does the output snp file contains any SNPs within the C9orf27 region? Or do you have an error stating all gene set does not contain any SNP it doesn't generate anything, just can't find any snp in this gene When you ask why those SNPs are including in the gene but not within the set:

Does your gene set of interest contain genes other than C9orf27? If so, then the SNPs within C9orf27 might have been clumped out. If you run PRSet with the same geneset, but using --no-clump, are those SNPs considered to be within the gene?

when I run a a list of five genes, including c9orf72 with --no-clump, the total list of snps is 411. When I go to the .snp file I don't see any c9orf72 snp included in the analysis. This is an extract from this list. Those snps belong to the c9orf72 gene, and all of them have a N. CHR SNP BP P Base Background GWAShits 9 rs9103 27546828 0.02451 Y Y N 9 rs13691 27546890 0.0005182 Y Y N 9 rs3739526 27547313 0.08819 Y Y N 9 rs73440933 27547986 0.2351 Y Y N 9 rs10967979 27548927 0.02779 Y Y N 9 rs773723 27548935 2.316e-05 Y Y N 9 rs80067552 27549485 0.2745 Y Y N 9 rs10120735 27550168 0.2491 Y Y N 9 rs62538125 27550999 0.06211 Y Y N 9 rs2453565 27551040 9.657e-18 Y Y N 9 rs12347201 27551927 0.0593 Y Y N 9 rs10967981 27552193 2.014e-05 Y Y N 9 rs149878414 27552327 0.356 Y Y N 9 rs71510499 27552460 0.6187 Y Y N 9 rs4878487 27552487 0.07902 Y Y N 9 rs68005046 27552632 0.5958 Y Y N 9 rs113076260 27552972 0.2145 Y Y N 9 rs12349820 27553876 0.06279 Y Y N 9 rs10124158 27554188 8.962e-06 Y Y N 9 rs12686452 27555016 0.007373 Y Y N 9 rs35815580 27555121 0.5665 Y Y N 9 rs60613335 27555626 0.2181 Y Y N 9 rs10967984 27555836 9.827e-09 Y Y N 9 rs17769246 27556464 0.2264 Y Y N 9 rs10122902 27556780 0.08498 Y Y N 9 rs62538126 27556831 0.4666 Y Y N 9 rs34670748 27557117 0.02011 Y Y N 9 rs10812613 27557454 0.05874 Y Y N 9 rs700828 27557537 2.772e-16 Y Y N 9 rs112616482 27557601 0.2319 Y Y N 9 rs4879564 27557833 0.07618 Y Y N 9 rs10757665 27557919 0.05548 Y Y N 9 rs4879566 27558186 0.001842 Y Y N 9 rs7860526 27559674 0.05502 Y Y N 9 rs774356 27559721 3.257e-16 Y Y N 9 rs1565948 27559733 1.091e-05 Y Y N 9 rs774357 27559835 8.941e-19 Y Y N 9 rs28526385 27559937 0.002062 Y Y N 9 rs10967985 27560417 0.05653 Y Y N 9 rs3849944 27560594 7.267e-06 Y Y N 9 rs12347222 27560965 0.006754 Y Y N 9 rs774359 27561049 2.048e-16 Y Y N 9 rs17769294 27561628 0.1414 Y Y N 9 rs2453554 27561800 5.188e-19 Y Y N 9 rs34366576 27562078 0.0236 Y Y N 9 rs10812615 27562233 0.0002855 Y Y N 9 rs10441712 27562881 0.0003121 Y Y N 9 rs2484319 27563755 6.748e-19 Y Y N 9 rs2453555 27563868 3.999e-19 Y Y N 9 rs7858531 27564008 0.05965 Y Y N 9 rs10812617 27564255 0.05544 Y Y N 9 rs7874565 27564338 0.05982 Y Y N 9 rs7859060 27564415 0.05986 Y Y N 9 rs2492816 27565105 6.226e-09 Y Y N 9 rs10812618 27565244 0.06419 Y Y N 9 rs10967986 27565300 1.341e-08 Y Y N 9 rs10757667 27565714 0.05466 Y Y N 9 rs1031153 27565936 0.0001522 Y Y N 9 rs117281883 27566333 0.5844 Y Y N 9 rs10757668 27567145 0.05531 Y Y N 9 rs41272891 27567481 0.2211 Y Y N 9 rs10757669 27567608 0.05931 Y Y N 9 rs10967988 27567635 9.128e-08 Y Y N 9 rs7872223 27567935 0.05704 Y Y N 9 rs72710403 27568604 0.1697 Y Y N 9 rs3849945 27568817 1.187e-18 Y Y N 9 rs111653040 27568950 0.1755 Y Y N 9 rs72710405 27568967 0.1706 Y Y N 9 rs10812619 27569188 7.116e-08 Y Y N 9 rs111630075 27569396 0.2223 Y Y N 9 rs13284967 27569572 9.633e-08 Y Y N 9 rs76602706 27569657 0.2224 Y Y N 9 rs700824 27570348 0.03627 Y Y N 9 rs78900326 27570589 0.2225 Y Y N 9 rs3849946 27571458 1.202e-07 Y Y N 9 rs17696653 27571819 0.2221 Y Y N 9 rs4520261 27571955 0.2221 Y Y N 9 rs2282241 27572255 1.476e-07 Y Y N 9 rs2282240 27572634 0.0001872 Y Y N 9 rs112048460 27573083 0.2276 Y Y N 9 rs117462033 27573213 0.02455 Y Y N 9 rs78074330 27573577 0.2516 Y Y N 9 rs41272893 27573826 0.271 Y Y N 9 rs11789520 27574515 1.404e-17 Y Y N 9 rs10967991 27574803 0.03313 Y Y N 9 rs2244606 27574978 2.765e-07 Y Y N 9 rs2453557 27575463 1.365e-07 Y Y N 9 rs1948522 27575785 0.156 Y Y N 9 rs77929821 27576823 0.1014 Y Y N 9 rs72710408 27577154 0.8144 Y Y N 9 rs11795154 27577611 2.544e-08 Y Y N 9 rs12345062 27578077 9.672e-09 Y Y N 9 rs

choishingwan commented 5 years ago

So I did some more digging and found some very interesting results:

  1. If I generate a gene set contain only C9orf27, and use the GTF file you provided, I got the same error stating the gene set does not contain any SNP
  2. If I use the same gene set, but this time use a much smaller GTF file, containing only the C9orf27 region, I can run the analysis without problem

I think there might have some hidden issues within the GTF processing that I'm unaware of before. I will try and see if I can figure out the issues can maybe I can come back to you once I figure that out.

sarasaezALS commented 5 years ago

I see, so do you think is a problem of the package or it is a problem of my GTF file?

choishingwan commented 5 years ago

I am trying to figure out what's the problem. Though I just realize I used the wrong gene name (C9orf27 instead of C9orf72), will try again

sarasaezALS commented 5 years ago

OK, let me know the output of that.

choishingwan commented 5 years ago

I've now set the gene name correct.

Using our toy data, I did the following (assuming the test file contain the SNP information you just posted above (contain 92 lines))

cat "Test C9orf72" > C9orf72
head -n 92 TOY_TARGET_DATA.bim > small
plink --bfile TOY_TARGET_DATA --extract small --out check --make-bed
awk 'NR==FNR {a[FNR]=$3} NR!=FNR {print "9\t"$2"\t"$3"\t"a[FNR]"\t"$5"\t"$6}' test check.bim > tmp
mv tmp check.bim
awk 'NR==FNR {a[$2]=$4} NR!=FNR && FNR==1{print} NR!=FNR && FNR!=1 && $1 in a {print $1" 9 "a[$1]" "$4" "$5" "$6" "$7}' check.bim TOY_BASE_GWAS.assoc > short.assoc
./PRSice --base short.assoc --target check --msigdb C9orf72 --gtf Homo_sapiens.GRCh37.75.gtf --no-clump --no-regress

And in this test, we do get SNPs in the C9orf72 region. I am not sure what's the difference between your analysis and mine. Which version of PRSice are you using? I am testing it on 2.2.6

sarasaezALS commented 5 years ago

Ok, it is working! I have added a --wind-3 and --wind-5 and now it is taking more SNPs and the results make much more sense. Thanks for your help.

choishingwan commented 5 years ago

Great!