Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
449 stars 151 forks source link

9th column of GTF file #638

Closed hadjie01 closed 4 years ago

hadjie01 commented 4 years ago

Hello,

I am trying to run VEP with my custom GTF file. Currently, only gene_id and transcript_id are included in the 9th column of my GTF file. Do I need to include more features? are exon number, exon id and biotypes mandatory?

Thanks a lot, Evi

dglemos commented 4 years ago

Hi @hadjie01, If you don't include the biotype VEP will try to interpret the 2nd column as the biotype. Here you can read about the format expectations.

hadjie01 commented 4 years ago

Hi Diana @dglemos,

Thanks a lot for the fast response!

Here is an example of my gtf file: chr12 PacBio transcript 14522 31411 . - . gene_id "ENSG00000226210.3"; transcript_id "PB.8092.4"; chr12 PacBio exon 14522 15153 . - . gene_id "ENSG00000226210.3"; transcript_id "PB.8092.4"; chr12 PacBio exon 15913 16065 . - . gene_id "ENSG00000226210.3"; transcript_id "PB.8092.4";

and here are the warnings that I get:

WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: ENSG00000236423.6, ENSG00000142611.17, ENSG00000233304.7, ENSG00000177133.11, ENSG00000227372.12, ENSG00000198912.11, ENSG00000130764.10, ENSG00000158109.15, ENSG00000116198.13, ENSG00000272235.1, ENSG00000116213.16 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: ENSG00000214114.9, ENSG00000174574.16, ENSG00000185668.7, novelGene_775, ENSG00000183520.12, ENSG00000228436.3, ENSG00000224592.5, ENSG00000116954.8, ENSG00000183386.10, ENSG00000158315.11, ENSG00000274944.4 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: ENSG00000198162.12, ENSG00000196505.11, ENSG00000271427.1, ENSG00000065183.16, ENSG00000116830.12, ENSG00000134253.10

Is it because I am missing the biotypes? All of my variants are classified as intergenic: chr1_237323773_C/A chr1:237323773 A - - - intergenic_variant - - - - - - IMPACT=MODIFIER chr1_244659908_G/C chr1:244659908 C - - - intergenic_variant - - - - - - IMPACT=MODIFIER chr1_28510353_T/C chr1:28510353 C - - - intergenic_variant - - - - - - IMPACT=MODIFIER

Best, Evi

dglemos commented 4 years ago

VEP is building the annotations from the GFT file using the different IDs (gene -> transcript -> exon/CDS -> ...). The warning means that those gene IDs are missing from your GTF file and you have annotations referring to these genes (gene_id attribute) in their 9th column.

You should have something like this: chr12 PacBio gene _start end_ . - . gene_id _"A"_; transcript_id _"B"_; chr12 PacBio transcript _start end_ . - . gene_id _"A"_; transcript_id _"B"_; chr12 PacBio exon _start end_ . - . gene_id _"A"_; transcript_id _"B"_; exon_number _x_

dglemos commented 4 years ago

I am closing this issue, but if you have any more questions please feel free to reopen it.

Best regards, Diana

hadjie01 commented 4 years ago

Hello Diana,

I noticed that the ExAC plugin available for GRCh38? If not, could it be possible to make it available? I appreciate your help!

Best Regards, Evi

From: Diana Lemos notifications@github.com Sent: Tuesday, December 17, 2019 11:29 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu; Mention mention@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

I am closing this issue, but if you have any more questions please feel free to reopen it.

Best regards, Diana

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWH2YTWEAPRHSGA2C2JTQZD43RA5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHDD3ZQ-23issuecomment-2D566640102&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=KoYUaUCkM4PXwY-Mmw6mU8ykPs4AVR73Irwx8TgDEFg&s=6_erxajLfA_vKHre5g4vIywqABRheNBxFZXC8cZNlnI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWH32OKW46KXGFIPIQODQZD43RANCNFSM4JKQWLSQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=KoYUaUCkM4PXwY-Mmw6mU8ykPs4AVR73Irwx8TgDEFg&s=heptyF1LZEg9DZuKql67exl6pkvYOVPk4Ih08b98qXQ&e=.

dglemos commented 4 years ago

Hello @hadjie01, The ExAC plugin is only available for GRCh37, we don't have plans to support it for GRCh38. We recommend to users that they use gnomAD frequency data as built into our cache files, as an alternative to ExAC, or to retrieve the frequency data using custom annotation. You can read more here: https://www.ensembl.org/info/docs/tools/vep/script/vep_example.html#gnomad

Best regards, Diana

hadjie01 commented 4 years ago

Hello Diana,

I would like to use the gnomADc frequency data so I downloaded the gnomAD data using the commands provided in gnomADc.pm:

release="3.0" genomes="https://storage.googleapis.com/gnomad-public/release/${release}/coverage/genomes" wget -x "${genomes}"/gnomad.genomes.r${release}.chr{{1..22},X}.coverage.txt.gz for i in "${genomes#*//}"/gnomad.genomes.r${release}.chr{{1..22},X}.coverage.txt.gz; do tabix -s 1 -b 2 -e 2 "${i}" done

However, I am getting an error with this command and not sure how to proceed .. could you please assist me on this matter? I am pasting the issue below.

Downloaded: 23 files, 1.3M in 0.4s (3.02 MB/s) [tabix] was bgzip used to compress this file? console.cloud.google.com/storage/browser/gnomad-public/release/3.0/coverage/genomes/gnomad.genomes.r3.0.chr1.coverage.txt.gz [tabix] was bgzip used to compress this file? console.cloud.google.com/storage/browser/gnomad-public/release/3.0/coverage/genomes/gnomad.genomes.r3.0.chr2.coverage.txt.gz [tabix] was bgzip used to compress this file? console.cloud.google.com/storage/browser/gnomad-public/release/3.0/coverage/genomes/gnomad.genomes.r3.0.chr3.coverage.txt.gz

Thanks a lot!!

Evi

From: Diana Lemos notifications@github.com Sent: Thursday, January 9, 2020 6:19 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu; Mention mention@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

Hello @hadjie01https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hadjie01&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=C0DGVTkiy_OHamkQjUQzD5JQtnnHjwZVuns_lhiud48&s=2pM3P_H0ZxGYmmOvrVgciJriOzSX45U1V3hYPKKFuPg&e=, The ExAC plugin is only available for GRCh37, we don't have plans to support it for GRCh38. We recommend to users that they use gnomAD frequency data as built into our cache files, as an alternative to ExAC, or to retrieve the frequency data using custom annotation. You can read more here: https://www.ensembl.org/info/docs/tools/vep/script/vep_example.html#gnomadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.ensembl.org_info_docs_tools_vep_script_vep-5Fexample.html-23gnomad&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=C0DGVTkiy_OHamkQjUQzD5JQtnnHjwZVuns_lhiud48&s=mAGXQmAHIAK0lI0iXEE2m634o327E56GHiQxXdiMSAs&e=

Best regards, Diana

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWH3XLFTQWHXFHYPXLPDQ44BZNA5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIP6TZQ-23issuecomment-2D572516838&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=C0DGVTkiy_OHamkQjUQzD5JQtnnHjwZVuns_lhiud48&s=Y6LlJF2kZJtFKmCTk2cSvd6shIJ_uUXWlPWGBISCA5o&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWH6OXRAIJM5HRJ6EDJTQ44BZNANCNFSM4JKQWLSQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=C0DGVTkiy_OHamkQjUQzD5JQtnnHjwZVuns_lhiud48&s=Dn34uT7rrvug3JJdsgUS_1sCOitTM5-tsd5gSPepZjk&e=.

dglemos commented 4 years ago

Hi, The gnomADc plugin retrieves data from coverage files which is probably not what you want to use. The coverage files are TXT format, running tabix on these files generates the error you're seeing.

To get frequency data from gnomAD you need to follow the instructions from this page: https://www.ensembl.org/info/docs/tools/vep/script/vep_example.html#gnomad. On this page, you can find the link to download the gnomAD files: ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/

hadjie01 commented 4 years ago

Hello Diana,

Thanks for the response!

According to the link below, I’ll need to use the following command to get gnomADc frequency data: ./vep -i examples/homo_sapiens_GRCh38.vcf --cache \ --custom gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH

But I am already using a --custom gtf annotation. Can I still use the above command?

Best, Evi

From: Diana Lemos notifications@github.com Sent: Monday, February 17, 2020 9:14 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu; Mention mention@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

Hi, The gnomADc plugin retrieves data from coverage files which is probably not what you want to use. The coverage files are TXT format, running tabix on these files generates the error you're seeing.

To get frequency data from gnomAD you need to follow the instructions from [this page] https://www.ensembl.org/info/docs/tools/vep/script/vep_example.html#gnomadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.ensembl.org_info_docs_tools_vep_script_vep-5Fexample.html-23gnomad&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=yJZb8xDo12p4APkvBhhAoqUQhayoPhNFZma48yc7CdQ&s=8uKo3Uj6WX4joLP2tS5b9FECBn2ZZYUrBdnaRjL5LVw&e=). On this page, you can find the link to download the gnomAD files: ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/https://urldefense.proofpoint.com/v2/url?u=ftp-3A__ftp.ensembl.org_pub_data-5Ffiles_homo-5Fsapiens_GRCh38_variation-5Fgenotype_gnomad_r2.1_genomes_&d=DwQCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=yJZb8xDo12p4APkvBhhAoqUQhayoPhNFZma48yc7CdQ&s=QfUOnnzhqfz2FQghPrgNZdDNTpYulRsPlciXEpDftFE&e=

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWHYEGZWVCR2QVGZKHUDRDKLS5A5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL6RXGA-23issuecomment-2D587013016&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=yJZb8xDo12p4APkvBhhAoqUQhayoPhNFZma48yc7CdQ&s=jBeUM76Qr46NtEx7iineF8zlLGXMbWggxhI9_V7fOlc&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWH5TRGP7C6RHAQEFDITRDKLS5ANCNFSM4JKQWLSQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=yJZb8xDo12p4APkvBhhAoqUQhayoPhNFZma48yc7CdQ&s=GhdPEhpP05hI7FQW-vzhFuE21G5n-GxQa6NRZEGPA9E&e=.

dglemos commented 4 years ago

You can use the two commands at the same time. You just add the --custom gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH to your previous command.

hadjie01 commented 4 years ago

Hello Diana,

Thank you!

I have a couple of more questions. Can I use gnomAD v3.0 instead of v2.1? In that case I would just need to download the ‘All Chromosomes VCF’ https://gnomad.broadinstitute.org/downloads, right?

In my output file I only get ‘gnomADg=rs904983435;gnomADg_FILTER=PASS’, not frequency values.

Best, Evi

From: Diana Lemos notifications@github.com Sent: Tuesday, February 18, 2020 6:21 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu; Mention mention@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

You can use the two commands at the same time. You just add the --custom gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH to your previous command.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWHYCRZARN4UCBDGU3QLRDPABNA5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBTOCI-23issuecomment-2D587413257&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=liE3i0GYV2I0MUa9dzw9PrR1933dEUMcCwcbnfA_7_s&s=rtSDw1dY8_pF4TzAjx2vvKRUx5Oqy11pcTVI7-EI5c4&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWHZGMFSH3CGCJII7GU3RDPABNANCNFSM4JKQWLSQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=liE3i0GYV2I0MUa9dzw9PrR1933dEUMcCwcbnfA_7_s&s=KRg6p-kaqn2COoncHL6BfOhyjjvGlhyLr0ywWiRmeuc&e=.

dglemos commented 4 years ago

You can use v3.0 downloaded from https://gnomad.broadinstitute.org/downloads for all chromosomes.

In my output file I only get ‘gnomADg=rs904983435;gnomADg_FILTER=PASS’, not frequency values.

When using the file you need to check if the header names match the names in your command, for instance, if in the file header it's ##INFO=<ID=AC_afr your command has to be gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AC_afr an so on for the other columns.

hadjie01 commented 4 years ago

Hello Diana,

Thanks for your help!

I have one more question for you (hopefully this will be the last one) .. I am using GTF annotation which means I have novel transcripts in my dataset so If I use ExAC pLI I will only get precalculated pLI scores per gene but I want to be able to take into account the effect of the novel transcripts, right?

If this is the case then I thought it might be better to use LOFTEE & LOEUF instead. Is this something that I can do with VEP/97? If yes, LOFTEE&LOEUF will recalculate the scores based on my transcripts, right? If I understood correctly, these tools are not using precalculated scores.

Also, is this the case with Condel? I won’t get a score for my novel transcripts because the SHIFT and PolyPhen scores are precalculated?

Thanks again!! Evi

From: Diana Lemos notifications@github.com Sent: Tuesday, February 18, 2020 11:48 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu; Mention mention@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

You can use v3.0 downloaded from https://gnomad.broadinstitute.org/downloadshttps://urldefense.proofpoint.com/v2/url?u=https-3A__gnomad.broadinstitute.org_downloads&d=DwMFaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=i0sohi4VqdnaI1sE2dvaAWlNwPQLuBI5loEC-mdICLU&s=lkBI5dNEiVh-lm_dGq3BSP3xcJve84dF14SwPTtsisM&e= for all chromosomes.

In my output file I only get ‘gnomADg=rs904983435;gnomADg_FILTER=PASS’, not frequency values. When using the file you need to check if the header names match the names in your command, for instance, if in the file header it's ##INFO=<ID=AC_afr your command has to be gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AC_afr an so on for the other columns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWH6MMCM64YQVG34WJD3RDQGKRA5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMCXACY-23issuecomment-2D587558923&d=DwMFaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=i0sohi4VqdnaI1sE2dvaAWlNwPQLuBI5loEC-mdICLU&s=Ss1k0Zl979f9YqoesxB050oqJ7AOmV_xln3Rw-b9rU8&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWH543B4SUCHQCVYOOP3RDQGKRANCNFSM4JKQWLSQ&d=DwMFaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=i0sohi4VqdnaI1sE2dvaAWlNwPQLuBI5loEC-mdICLU&s=toWJuuntK7CagC_ilcJSI70AovGsCmRS-bEg89DL8Z4&e=.

hadjie01 commented 4 years ago

Hello Diana,

I am just following up on my questions below.

Are LOFTEE & LOEUF available with VEP/97? If yes, which files I should be using? And what would be the best way to predict pathogenicity for novel transcripts?

Thanks a lot, Evi

From: Hadjimichael, Evi Sent: Thursday, February 27, 2020 12:45 PM To: 'Ensembl/ensembl-vep' reply@reply.github.com; Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Mention mention@noreply.github.com Subject: RE: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

Hello Diana,

Thanks for your help!

I have one more question for you (hopefully this will be the last one) .. I am using GTF annotation which means I have novel transcripts in my dataset so If I use ExAC pLI I will only get precalculated pLI scores per gene but I want to be able to take into account the effect of the novel transcripts, right?

If this is the case then I thought it might be better to use LOFTEE & LOEUF instead. Is this something that I can do with VEP/97? If yes, LOFTEE&LOEUF will recalculate the scores based on my transcripts, right? If I understood correctly, these tools are not using precalculated scores.

Also, is this the case with Condel? I won’t get a score for my novel transcripts because the SHIFT and PolyPhen scores are precalculated?

Thanks again!! Evi

From: Diana Lemos notifications@github.com<mailto:notifications@github.com> Sent: Tuesday, February 18, 2020 11:48 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com<mailto:ensembl-vep@noreply.github.com> Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu<mailto:evi.hadjimichael@mssm.edu>; Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

You can use v3.0 downloaded from https://gnomad.broadinstitute.org/downloadshttps://urldefense.proofpoint.com/v2/url?u=https-3A__gnomad.broadinstitute.org_downloads&d=DwMFaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=i0sohi4VqdnaI1sE2dvaAWlNwPQLuBI5loEC-mdICLU&s=lkBI5dNEiVh-lm_dGq3BSP3xcJve84dF14SwPTtsisM&e= for all chromosomes.

In my output file I only get ‘gnomADg=rs904983435;gnomADg_FILTER=PASS’, not frequency values. When using the file you need to check if the header names match the names in your command, for instance, if in the file header it's ##INFO=<ID=AC_afr your command has to be gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AC_afr an so on for the other columns.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWH6MMCM64YQVG34WJD3RDQGKRA5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMCXACY-23issuecomment-2D587558923&d=DwMFaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=i0sohi4VqdnaI1sE2dvaAWlNwPQLuBI5loEC-mdICLU&s=Ss1k0Zl979f9YqoesxB050oqJ7AOmV_xln3Rw-b9rU8&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWH543B4SUCHQCVYOOP3RDQGKRANCNFSM4JKQWLSQ&d=DwMFaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=i0sohi4VqdnaI1sE2dvaAWlNwPQLuBI5loEC-mdICLU&s=toWJuuntK7CagC_ilcJSI70AovGsCmRS-bEg89DL8Z4&e=.

dglemos commented 4 years ago

Hi, Sorry for taking so long to reply. If you're using GTF annotation with novel transcripts with plugins that use pre-calculated scores you will get scores for the transcripts used by the tool and not your transcripts. This will happen with Sift, PolyPhen and as you said Condel integrates the output of these tools and calculates the weighted average of these scores which means these scores will not consider your custom annotation. Loftee looks at a given variant-transcript pair so you should get scores based on your novel transcripts. We are not looking after the Loftee plugin. If you have any questions related to the plugin you should ask here: https://github.com/konradjk/loftee/issues

hadjie01 commented 4 years ago

Hello Diana,

Thanks a lot for the response! Is there any other tool/plugin that I could use to calculate pathogenicity for my novel transcripts?

Thanks again, Evi

From: Diana Lemos notifications@github.com Sent: Tuesday, March 10, 2020 9:41 AM To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Hadjimichael, Evi evi.hadjimichael@mssm.edu; Mention mention@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] 9th column of GTF file (#638)

USE CAUTION: External Message.

Hi, Sorry for taking so long to reply. If you're using GTF annotation with novel transcripts with plugins that use pre-calculated scores you will get scores for the transcripts used by the tool and not your transcripts. This will happen with Sift, PolyPhen and as you said Condel integrates the output of these tools and calculates the weighted average of these scores which means these scores will not consider your custom annotation. Lofteehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_konradjk_loftee&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=tek_1Yc263V6fOysFVko3cHD5qSNx193uQXQXNFi2Is&s=0vyfIbgFqCXFK690PaP0yLkPZaUMPH2xsgf_wHyQCNk&e= looks at a given variant-transcript pair so you should get scores based on your novel transcripts. We are not looking after the Loftee plugin. If you have any questions related to the plugin you should ask here: https://github.com/konradjk/loftee/issueshttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_konradjk_loftee_issues&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=tek_1Yc263V6fOysFVko3cHD5qSNx193uQXQXNFi2Is&s=89a03gtl5jj1NhIRmuQ6wEmRghLZ7OCqT4FAyZa2uHY&e=

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_issues_638-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DANWYWHZABSZCPKQEY4V3IXDRGY7OVA5CNFSM4JKQWLS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOLOJGI-23issuecomment-2D597091481&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=tek_1Yc263V6fOysFVko3cHD5qSNx193uQXQXNFi2Is&s=ElHjapfbvTx6vQEHeH_qJZ0n3urxIsZ6SHJ65BiCKNU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ANWYWH2O7ZZEV3RYRSYR6C3RGY7OVANCNFSM4JKQWLSQ&d=DwMCaQ&c=shNJtf5dKgNcPZ6Yh64b-A&r=RQzhthZZJ579IKxmJgBgan8Uh8CER7kHQUBHs8gfMtc&m=tek_1Yc263V6fOysFVko3cHD5qSNx193uQXQXNFi2Is&s=9tmRVAdrPUk76v6WsASerlEP53Llx1RrZ3B4YvxQkXU&e=.

dglemos commented 4 years ago

If you want to consider your novel transcripts you should avoid all the plugins that use pre-calculated scores from files. Most of our plugins use these pre-calculated files but you can check the tools used to generate these files and try to use them for your use case. One example is SpliceAI plugin. The plugin uses a file that contains pre-calculated scores, but you could run the SpliceAI tool for your gene annotation file (with the novel transcripts) and attach those scores to your vep annotation.

dglemos commented 4 years ago

I'm going to close this issue. Feel free to open a new issue if you have more questions.