brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
248 stars 23 forks source link

Unknown Consequence Types #118

Open MattWellie opened 2 years ago

MattWellie commented 2 years ago

From annotations in VEP 105, I'm seeing a flurry of warning messages

warning: unknown impact "splice_polypyrimidine_tract" warning: unknown impact "splice_donor_region"

e.g.

warning: unknown impact "splice_polypyrimidine_tract" from csq "A|splice_polypyrimidine_tract_variant&intron_variant|LOW|PAX6|ENSG00000007372|Transcript|ENST00000640613|protein_coding||9/12|ENST00000640613.1:c.607-12C>T...

brentp commented 2 years ago

Thanks for reporting. You can get around this in the short term by taking this file and copying it locally (make sure to get the text file, not the html) and then putting those unknown impacts in the list where you think appropriate. then use:

SLIVAR_IMPACTFUL_ORDER=/path/to/adjusted-order.txt slivar expr ...

I'll add these and get a new release out. Do you see any others? And do you have any input on where in the order to include those? I am thinking of putting splice_donor_region right by splice_region in the current list (which would make it impactful). I don't see much info for splice_polypyrimidine_tract generally, and none here: https://m.ensembl.org/info/genome/variation/prediction/predicted_data.html so I am open to suggestions on where to put that (is it impactful?)

brentp commented 2 years ago

I see from your paste that VEP is calling the splice_polypyrimidine_tract as impact of LOW. What do they have for "splice_donor_region" ?

MattWellie commented 2 years ago

The VEP 104 Plugin branch appears to be the origin of all 3 terms, and in theory also includes extended_intronic_splice_region_variant_5prime and extended_intronic_splice_region_variant_3prime which I've not come across so far: https://github.com/Ensembl/VEP_plugins/blob/release/104/SpliceRegion.pm. The latter 2 terms are not present in the VEP 105 release notes, so maybe they will be a future addition

In my VEP-annotated VCF these are always (or almost always) partnered with low consequence terms, and the transcript consequence as a whole is also rated LOW e.g.

unknown impact "splice_donor_5th_base" from csq "G|splice_donor_5th_base_variant&intron_variant|LOW|SLC35D1...
unknown impact "splice_donor_region" from csq "-|splice_donor_region_variant&intron_variant|...
unknown impact "splice_donor_region" from csq "T|splice_donor_region_variant&intron_variant&non_coding_transcript_variant...

I don't feel qualified to place these within the heirarchy of consequences, but they appear to be sub-categorisation of splice_region_variant which is LOW

mike8115 commented 2 years ago

Related question: what does slivar do when it finds an unrecognized impact? Does slivar classify the variant by the remaining impact entries or reject the entire line in the VCF? Also, what happens when slivar doesn't recognize any of the impact terms?

brentp commented 2 years ago

When it finds an unrecognized impact, it issues the warning and it sets the "impact_order" to just below impactful. So the unknown variants will not be impactful, but all other processing will proceed as normal (even if no impacts are recognized).

karoliinas commented 2 years ago

Hi, many thanks for your work with slivar and smoove! Quick question on this topic, I'm running slivar with docker and the above fix (SLIVAR_IMPACTFUL_ORDER=/path/to/adjusted-order.txt slivar expr ...) throws an error "no such file or directory: unknown" although the path to the file is correct, and made sure docker has access to it.

brentp commented 2 years ago

can you show the output of running this in the docker container:

export SLIVAR_IMPACTFUL_ORDER=/path/to/adjusted-order.txt # with corrected path
head $SLIVAR_IMPACTFUL_ORDER
slivar expr ...
karoliinas commented 2 years ago

head $SLIVAR_IMPACTFUL_ORDER transcript_ablation splice_acceptor splice_donor stop_gained ...

I tested running the container with docker run --name="slivar" -v "/mnt:/mnt" -e $SLIVAR_IMPACTFUL_ORDER --cpus "8" -dt "brentp/slivar" But get the same warnings of unknown impact.

brentp commented 2 years ago

Hi, in order to help you, I'll need more information. What warnings do you see? Can you show the full stdout+stderr of the slivar command and the command itself?

karoliinas commented 2 years ago

Hi, thanks! Here's the beginning of the log: > slivar version: 0.2.7 71af7d12881ae0590c6d2a97ef2b282cc93fe7c6 [slivar] 4 samples matched in VCF and PED to be evaluated [slivar] message for /mnt/gemma/bin/resources/slivar/gnomad.hg38.genomes.v3.fix.zip: > created on:2019-11-15 [slivar tsv] warning! didn't find ANN in header in /mnt/data/GEMMAgenomics/hg38_bams/gvcfs2/fams/fam002_SNPVEP.vcf.gz trying other fields [slivar tsv] warning! didn't find BCSQ in header in /mnt/data/GEMMAgenomics/hg38_bams/gvcfs2/fams/fam002_SNPVEP.vcf.gz trying other fields [slivar] evaluating on 2 trios warning: unknown impact "splice_polypyrimidine_tract" from csq "G|splice_polypyrimidine_tract_variant&intron_variant|LOW|NOC2L|ENSG00000188976|Transcript|ENST00000327044|protein_coding||13/18|ENST00000327044.7:c.1558-13T>C|||||||rs4970378||-1||SNV|HGNC|HGNC:24517|YES|NM_015658.4||1|P1|CCDS3.1|ENSP00000317992|Q9Y3T9.182||UPI000041820C|||||||||1|1|1|1|1|0.9995|1|1|0.9996|1|1|1|1|1|1|1|1|AFR&AMR&EAS&EUR&SAS&EA&gnomAD_AMR&gnomAD_ASJ&gnomAD_EAS&gnomAD_FIN&gnomAD_NFE&gnomAD_OTH&gnomAD_SAS|1KG_ALL:G:NA|||||||||" please report the variant at https://github.com/brentp/slivar/issues warning: unknown impact "splice_donor_region" from csq "C|splice_donor_region_variant&intron_variant|LOW|NOC2L|ENSG00000188976|Transcript|ENST00000327044|protein_coding||8/18|ENST00000327044.7:c.888+4C>G|||||||rs13303056||-1||SNV|HGNC|HGNC:24517|YES|NM_015658.4||1|P1|CCDS3.1|ENSP00000317992|Q9Y3T9.182||UPI000041820C|||||||||0.829|0.9135|0.9365|0.9523|0.9172|0.8238|0.9333|0.9321|0.8235|0.9508|0.9351|0.9215|0.9599|0.9413|0.9357|0.9192|0.9599|gnomAD_FIN|1KG_ALL:C:NA|||||||||" please report the variant at https://github.com/brentp/slivar/issues ... And the list continues with warnings of unknown impact with these two consequences. I was hoping that having the updated SLIVAR_IMPACTFUL_ORDER -variable in the running container would fix it. I'm not able to give it to slivar in the docker exec, which I think is the problem. I'm running with: docker run --name="slivar" -v "/mnt:/mnt" -e $SLIVAR_IMPACTFUL_ORDER -e $logfile -e $errfile --cpus "8" -dt "brentp/slivar" docker exec slivar slivar expr --vcf /mnt/data/genomics/fams/fam002_SNPs.vcf.gz --ped /mnt/data/genomics/fams/fam002.ped -o /mnt/data/genomics/fams/fam002_denovos.vcf --pass-only -g /mnt/bin/resources/slivar/gnomad.hg38.genomes.v3.fix.zip --info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --js /mnt/bin/resources/slivar/slivar-functions.js --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' --family-expr 'recessive:fam.every(segregating_recessive)' --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' --family-expr 'x_recessive:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_recessive_x)' --trio 'comphet_side:comphet_side(kid, mom, dad) && INFO.gnomad_nhomalt < 10' 1>$logfile 2>$errfile

karoliinas commented 2 years ago

Sorry, I was just re-reading back in the thread and notice you mention not having all the consequences (as these are probably quite low impact) in the list is not a problem. I do get ~600 variants running this workflow.

brentp commented 2 years ago

oh. yes, that is just a warning. I thought since you were specifying the SLIVAR_IMPACTFUL_ORDER, that you were filling in these missing impacts. If they are not in the file, then it will issue a warning but will continue with other variants and impacts.