WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
224 stars 342 forks source link

How to get the expected ratio of a specific genomic region in genome background from annovar database? #136

Open zengxi-hada opened 3 years ago

zengxi-hada commented 3 years ago

Hi, I have used ANNOVAR to annotated the SNPs using gene-based annotation and hg38. Then I want to know whether the SNPs in my data are enriched in a specific region (for example exonic or intergenic) compared to the expected ratio in the genome background.

The question is, in order to calculate the enrichment degree, I need to first know the expected ratio of exonic region in hg38 genome backgound of annovar database. To obtain a meaningful enrichment degree, I know the calculation method of the expected ratio must be strictly consistent with the ANNOVAR algorithm and ANNOVAR database. Could you let me know how to get the expected ratio for a specific region (for example, exonic region) in genome background from your ANNOVAR database?

Thanks, I really appreciate it if you could help.

best regards, Michael Zeng

kaichop commented 3 years ago

Depending on which db you used, you have to write a script yourself to summarize all regions in the db. Say for example, if you used hg38_refGene.txt, then you can take all exonic regions in this file, then take the union of them (there are many tools to do that such as bedtools), then calculate the total length, then divide by the genome length. This is not an ideal solution, because a small fraction of genome is not mappable (there are no variants in those regions because Illumina sequencing cannot detect variants in those complex regions).

On Fri, May 14, 2021 at 5:28 AM zengxi-hada @.***> wrote:

Hi, I have used annovar to annotated the SNPs using gene-based annotation and hg38. Then I want to know whether the SNPs in my data are enriched in a specific region (for example exonic) compared to the expected ratio in the genome background. The question is, In order to calculate the enrichment, I need to know the ratio of exonic region in hg38 of annovar database. Could you let me know how to get the expected ratio for exonic region from your annovar database?

Thanks, I really appreciate it if you could help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/136, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OBYOAT3AA7CPAX4VO3TNTUM3ANCNFSM444DGLOA .

zengxi-hada commented 3 years ago

Depending on which db you used, you have to write a script yourself to summarize all regions in the db. Say for example, if you used hg38_refGene.txt, then you can take all exonic regions in this file, then take the union of them (there are many tools to do that such as bedtools), then calculate the total length, then divide by the genome length. This is not an ideal solution, because a small fraction of genome is not mappable (there are no variants in those regions because Illumina sequencing cannot detect variants in those complex regions). On Fri, May 14, 2021 at 5:28 AM zengxi-hada @.***> wrote: Hi, I have used annovar to annotated the SNPs using gene-based annotation and hg38. Then I want to know whether the SNPs in my data are enriched in a specific region (for example exonic) compared to the expected ratio in the genome background. The question is, In order to calculate the enrichment, I need to know the ratio of exonic region in hg38 of annovar database. Could you let me know how to get the expected ratio for exonic region from your annovar database? Thanks, I really appreciate it if you could help. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#136>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OBYOAT3AA7CPAX4VO3TNTUM3ANCNFSM444DGLOA .

Thanks for your reply.