Closed davidroad closed 3 years ago
Have you checked that the SNP does not occur twice in your input file?
On Thu, Nov 8, 2018 at 12:35 AM davidroad notifications@github.com wrote:
Dear PASCAL author, I was running PASCAL to analyze GWAS data. I found at least one SNP on chr22, rs113940759 , will lead to ```reading snp positions from file:resources/1kg/EUR.chr21.pos.ser.gz Reading file: resources/1kg/EUR.chr21.pos.ser.gz reading snp positions from file:resources/1kg/EUR.chr22.pos.ser.gz Reading file: resources/1kg/EUR.chr22.pos.ser.gz java.lang.RuntimeException: snp seems to have been set before at ch.unil.genescore.vegas.ReferencePopulation.loadGwasAndRelevantSnpsPos(ReferencePopulation.java:279) at ch.unil.genescore.vegas.ReferencePopulation.initializeSnps(ReferencePopulation.java:121) at ch.unil.genescore.vegas.ReferencePopulation.loadGwasAndRelevantSnps(ReferencePopulation.java:330) at ch.unil.genescore.main.Main.computeGeneScores(Main.java:158) at ch.unil.genescore.main.Main.run(Main.java:136) at ch.unil.genescore.main.Main.main(Main.java:50)
Do you have a way to solve this? I decomplier the pascalDeployed.jar. And find that this "snp seems to have been set before" was raised from the \ch\unil\genescore\vegas\ReferencePopulation.java
if ((chr_ != "none") || (start_ != -1) || (end_ != -1)) { throw new RuntimeException("snp seems to have been set before"); }
I tried to commit those codes. However, I can't complier it due to the decomplier errors. So I can't do it by myself. I supposed it could be the problem of the 1KG reference panel annotation. Do you guys have any idea to solve this problem?— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dlampart/Pascal/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkJtap_qOx0kAjnxB3Mc9IL5Ul-Xu_wks5us24xgaJpZM4YTtuU .
Thanks for the reply. Yes, I checked the SNP. It only appeared once. Overall I find two SNP (rs113940759 & rs71904485) will lead to this problem, when put any of them in --pval file alone or with other SNPs as an input. In my summary statistics file, the annotation for these two SNPs and their flanking 1bp SNPs are
chromosome rsid ref alt pos pvalue
chr22 chr22_42247506_D I2 D 42247506 0.08027
**chr22 rs113940759 I2 D 42247507 0.05939**
chr22 rs60804715 T G 42247507 0.1089
----
chr22 chr22_50572749_D D I4 50572749 0.03104
**chr22 rs71904485 D I3 50572750 0.0342**
chr22 rs3736688 A G 50572770 0.04968
however, in another GWAS summary statistics data, Even these two SNPs id have occured twice, PASCAL can still run.
rsid chromosome pos ref alt pvalue
rs200740168 22 42247491 T TTTTG 0.537
**rs113940759 22 42247503 GT G 0.792**
rs201077567 22 42247506 T TG 0.738
rs60804715 22 42247507 T G 0.731
**rs113940759 22 42247507 GT G 0.747**
rs12170228 22 42247695 T C 0.126
--
rs74828492 22 50572746 TCA T 0.806
**rs71904485 22 50572748 ATTTT A 0.806**
rs201435664 22 50572749 T TGAA 0.806
**rs71904485 22 50572750 G GAA 0.807**
rs3736688 22 50572770 G A 0.954
And I am not very familiar with Java, so I am not sure what process conducted in the \ch\unil\genescore\vegas\ReferencePopulation.java
Hi sorry for the late reply. Can you post a small input txt file that will produce the error?
On Thu, Nov 8, 2018 at 7:47 PM davidroad notifications@github.com wrote:
Thanks for the reply. Yes, I checked the SNP. It only appeared once. Overall I find two SNP (rs113940759 & rs71904485) will lead to this problem, when put any of them in --pval file alone or with other SNPs as an input. In my summary statistics file, the annotation for these two SNPs and their flanking 1bp SNPs are
chromosome rsid ref alt pos pvalue chr22 chr22_42247506_D I2 D 42247506 0.08027chr22 rs113940759 I2 D 42247507 0.05939 chr22 rs60804715 T G 42247507 0.1089
chr22 chr22_50572749_D D I4 50572749 0.03104chr22 rs71904485 D I3 50572750 0.0342 chr22 rs3736688 A G 50572770 0.04968
however, in another GWAS summary statistics data, Even these two SNPs id have occured twice, PASCAL can still run.
rsid chromosome pos ref alt pvalue rs200740168 22 42247491 T TTTTG 0.537rs113940759 22 42247503 GT G 0.792 rs201077567 22 42247506 T TG 0.738 rs60804715 22 42247507 T G 0.731rs113940759 22 42247507 GT G 0.747 rs12170228 22 42247695 T C 0.126
rs74828492 22 50572746 TCA T 0.806rs71904485 22 50572748 ATTTT A 0.806 rs201435664 22 50572749 T TGAA 0.806rs71904485 22 50572750 G GAA 0.807 rs3736688 22 50572770 G A 0.954
And I am not very familiar with Java, so I am not sure what process conducted in the \ch\unil\genescore\vegas\ReferencePopulation.java
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlampart/Pascal/issues/3#issuecomment-437112053, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkJtRhFci41xC6mBdY0BYpEQkTEgFfHks5utHwmgaJpZM4YTtuU .
Hi, I think I found the problem. It is raised by the duplicates SNPs in reference panel. https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=OXSTATGEN;ac75cd63.1402 Overall, I found three SNPs have this problem (rs113940759, rs71904485,rs11457237 all from chr22 from GWAS study). I believed there are more duplicated SNPs in the reference panel. It will be kind if you can help me to generate a list of SNP duplicated in the reference panel (EA population). And eliminating those SNPs will solve the "snp seems to have been set before" error.
Hi, I rebuilt the EUR reference panel (1000G phase 1 v3, 379 individuals) by the files downloaded from http://csg.sph.umich.edu/abecasis/mach/download/1000G.2012-03-14.html to replace the default reference panel. And there would be no errors. I compared the problematic chr22. The will no duplicated rsid in chr22, though there will 9995 "." site compared with only 429 "." in default panel. Which reference panel did you use in PASCAL?
yes, I believe we used an earlier release. Let me think about how to fix this and get back to you. Thanks for your work on this.
David
On Wed, Nov 21, 2018 at 7:02 AM davidroad notifications@github.com wrote:
Hi, I rebuilt the EUR reference panel (1000G phase 1 v3) by the files downloaded from http://csg.sph.umich.edu/abecasis/mach/download/1000G.2012-03-14.html to replace the default reference panel. And there would be no errors. I compared the problematic chr22. The will no duplicated rsid in chr22, though there will 9995 "." site compared with only 429 "." in default panel. Which reference panel did you use in PASCAL?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlampart/Pascal/issues/3#issuecomment-440543134, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkJteAtVl9YW3-hRr2FkeVqrDfIKiZJks5uxOxcgaJpZM4YTtuU .
Hi, thank you for the reply. I think I already solved the reference problem by downloading the 1000G European reference (http://csg.sph.umich.edu/abecasis/mach/download/1000G.2012-03-14.html) and rebuild it as the reference. There will be no error anymore. Actually, I got another problem in gene-level pvalue calculation. I found the result of gene-level pvalue significance could vary a lot between two condition (all SNP pvalue from GWAS, and only SNPs pvalue < 0.05). The former condition will flattern the gene-level pvalue signal, while the later condition will inflate the significance of gene-level pvalue. Do you have any idea to balance this problem?
HI, The gene level statistics will not be correct anymore when you subset the p-values based on p-value (any other pruning is fine).Maybe try out the max gene score setting. It can often give you less flat gene-level p-values. (pathway p-values will be less impacted).
best, David
On Tue, Nov 27, 2018 at 6:06 PM davidroad notifications@github.com wrote:
Hi, thank you for the reply. I think I already solved the reference problem by downloading the 1000G European reference ( http://csg.sph.umich.edu/abecasis/mach/download/1000G.2012-03-14.html) and rebuild it as the reference. There will be no error anymore. Actually, I got another problem in gene-level pvalue calculation. I found the result of gene-level pvalue significance could vary a lot between two condition (all SNP pvalue from GWAS, and only SNPs pvalue < 0.05). The former condition will flattern the gene-level pvalue signal, while the later condition will inflate the significance of gene-level pvalue. Do you have any idea to balance this problem?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlampart/Pascal/issues/3#issuecomment-442138248, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkJtS4hB5EvWd6qDnRlaXLsvP9q_5Teks5uzXEtgaJpZM4YTtuU .
Hi, Thank you for the advice. I had a concern of using max gene score that gene length could bias ( the longer the gene, the more likely it will get a more significant p-value from SNP. Do you any suggestion to overcome this? Thanks!
You dont have to worry about this. Pascal controls for that. You just need to set the flag --genescoring=max However, again you are not allowed to filter the SNPs based on p-values beforehand.
best, David
On Fri, Nov 30, 2018 at 5:31 PM davidroad notifications@github.com wrote:
Hi, Thank you for the advice. I had a concern of using max gene score that gene length could bias ( the longer the gene, the more likely it will get a more significant p-value from SNP. Do you any suggestion to overcome this? Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlampart/Pascal/issues/3#issuecomment-443259860, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkJtZxWGU9IhVDNDwlJctuUAWwnuFI9ks5u0V1tgaJpZM4YTtuU .
Hi David, Thanks for the advice! BTW, I have another short question about exclude hla gene. I kept the command "excludedGenesFile = resources/annotation/hla/hlaGenesEntrezIds.txt" in settings, but the hla genes can still be observed in the result. What can I do?
Sorry for the late reply.
So that option only removes genes during the pathway enrichment score computation state. The gene scores are still calculated.
best, David
On Fri, Dec 7, 2018 at 12:16 AM davidroad notifications@github.com wrote:
Hi David, Thanks for the advice! BTW, I have another short question about exclude hla gene. I kept the command "excludedGenesFile = resources/annotation/hla/hlaGenesEntrezIds.txt" in settings, but the hla genes can still be observed in the result. What can I do?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dlampart/Pascal/issues/3#issuecomment-445065612, or mute the thread https://github.com/notifications/unsubscribe-auth/AGkJtbHVylshHPUZp1v7BRlKixfWyKKSks5u2aVogaJpZM4YTtuU .
Hi,
Sorry to resurrect yet another issue, by I am having the same issue davidroad mentioned. However, since I am using another reference panel, the only way I managed to overcome this issue was by downloading an even earlier release of the reference pannel, after attempting several other versions of the 1KG panel (all displaying errors at different chromosomes). This appears to me a suboptimal solution, but I am clueless on how to fix it.
I have attempted to use another annotation system, which apparently overcomes this (for the same 1KG version, uscs annotation throws an error vs. gencode proceeds), but the analysis outputs an empty genescore file (apparently a distinct issue).
Any idea after all this time?
Thanks for your work on Pascal!
Best, medak
Guys, I know this reply comes very late, but maybe still helpful to some. I think the problem arises because 3 snps are in the ld reference multiple times (also "." ids are not allowed). If you were to remove the snps rs11457237, rs113940759, rs71904485. The problem only comes up if you have p-values for any of those 3 snps. (If you construct LD from other data, ensure that no duplicate entries are in there). I will try to a note and an optional filtering step on github. changing the deployed version will be tricky as I'm not at the hosting institution anymore.
Dear PASCAL author, I was running PASCAL to analyze GWAS data. I found at least one SNP on chr22, rs113940759 , will lead to
Do you have a way to solve this? I decomplier the pascalDeployed.jar. And find that this "snp seems to have been set before" was raised from the \ch\unil\genescore\vegas\ReferencePopulation.java
I tried to commit those codes. However, I can't complier it due to the decomplier errors. So I can't do it by myself. I supposed it could be the problem of the 1KG reference panel annotation. Do you guys have any idea to solve this problem?