bioinfo-pf-curie / TMB

Tumor Mutational Burden
Other
53 stars 15 forks source link

how to use snpeff annotation with databases gnomAD,1K #9

Open user-tq opened 2 years ago

user-tq commented 2 years ago

I'm not familiar with snpeff and filter snv i try

java -jar snpEff.jar  databases|grep  gnomad

i can't found anything mabey i should build database by myself i also found a demo cmd:

java -jar SnpSift.jar annotate dbSnp132.vcf variants.vcf > variants_annotated.vcf

maby i should get gnomad.vcf and run the cmd ?then i also should get 1K.vcf ?

I'm confused about how to get the true vcf to use options

--filterPolym --polymDb 1k,gnomad
tomgutman commented 2 years ago

Hello,

in order to filter the snv from 1000 Genome and gnomad I would advise you to first annotate your vcf using gnomad dabase with Snpsift as showed in the demo command. 1K annotation is already included in the gnomad DB. Then by specifying the right fields in the snpeff.config file you will be able to filter the snvs.

We used gnomad.genomes.r2.1.1.sites.vcf.bgz to annotate our vcfs

best Tom

user-tq commented 2 years ago

thank you for your answer sincerely. Have to say,the gnomad.genomes.r2.1.1.sites.vcf.bgz is too big(460G!!!),and i found the gnomad_db file from Annovar (*/annovar/humandb/hg19_gnomad_genome.txt) just 16G, i also try to use the hg19_gnomad_genome.txt as vcf to annovate my vcf with snpeff , java -jar SnpSift.jar annotate hg19_gnomad_genome.txt my.vcf but i get erro Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: WARNING: Unkown IUB code for SNP '-' at org.snpsift.SnpSiftCmdAnnotate.annotate(SnpSiftCmdAnnotate.java:72) at org.snpsift.SnpSiftCmdAnnotate.run(SnpSiftCmdAnnotate.java:410) at org.snpsift.SnpSiftCmdAnnotate.run(SnpSiftCmdAnnotate.java:397) at org.snpsift.SnpSift.run(SnpSift.java:580) at org.snpsift.SnpSift.main(SnpSift.java:76) Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: WARNING: Unkown IUB code for SNP '-' (it 's not a vcf) I think there should be a better way anno genomad and filter it (use less storage , fewer tags and faster)with subset of gnomad( jsut contain pos and MAF).

tomgutman commented 2 years ago

Hi, you could try to download the exome only file (58Gb) : https://gnomad.broadinstitute.org/downloads or even the file af-only-gnomad_modified.raw.sites.vcf (13gb). Be sure to add -info "AF" in the Snpsift command line in order to specifiy the field used for annotation

link to AF-only file: https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-b37?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false

best Tom