exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
200 stars 55 forks source link

Add report feature to flag up variants which Jannovar fails to annotate #2

Closed julesjacobsen closed 7 years ago

julesjacobsen commented 9 years ago

Sometimes Jannovar is unable to annotate a variant and throws an exception. This is caught by Exomiser, but the variant is not included in the analysis or results which could lead to incorrect results.

There are two ways this could be handled:

  1. These variants could be flagged and indicated to the user so that they are aware of the issue.
  2. Exomiser should simply stop the analysis and report the reason for the failure.

Votes on behaviour please....

pnrobinson commented 9 years ago

I would vote for option #1. I think that we are slowly removing the remaining bugs in the code, but certainly we do not want Exomiser users to not be able to use the program whilst waiting for a bug fix. Other programs presumably just silently skip these variants, but we should at least strive to be honest about potentialy problems!

damiansm commented 9 years ago

Ditto

On Sat, Dec 6, 2014 at 10:02 AM, Peter Robinson notifications@github.com wrote:

I would vote for option #1. I think that we are slowly removing the remaining bugs in the code, but certainly we do not want Exomiser users to not be able to use the program whilst waiting for a bug fix. Other programs presumably just silently skip these variants, but we should at least strive to be honest about potentialy problems! -peter

Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor in the Bioinformatics Division of the Department of Mathematics and Computer Science of the Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org Introduction to Bio-Ontologies: http://www.crcpress.com/product/isbn/9781439836651 I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson


Von: Jules Jacobsen [notifications@github.com] Gesendet: Freitag, 5. Dezember 2014 18:02 An: exomiser/Exomiser Betreff: [Exomiser] Add report feature to flag up variants which Jannovar fails to annotate (#2)

Sometimes Jannovar is unable to annotate a variant and throws an exception. This is caught by Exomiser, but the variant is not included in the analysis or results which could lead to incorrect results.

There are two ways this could be handled:

  1. These variants could be flagged and indicated to the user so that they are aware of the issue.
  2. Exomiser should simply stop the analysis and report the reason for the failure.

Votes on behaviour please....

— Reply to this email directly or view it on GitHub< https://github.com/exomiser/Exomiser/issues/2>.

— Reply to this email directly or view it on GitHub https://github.com/exomiser/Exomiser/issues/2#issuecomment-65891808.

buske commented 9 years ago

+1

holtgrewe commented 9 years ago

I would vote for option 1.

visze commented 9 years ago

1 :-)

julesjacobsen commented 9 years ago

Sounds like there is a slight leaning towards option 1...

commit: a22b207a50d2ae610c9643d8a544d54e78688c2f

Added VariantEvaluation.hasAnnotations method to as a means of easily testing whether a variant was annotated by Jannovar.

I've incorporated this into the VcfWriter so that the output is more compatible with the actual VCF spec, in particular, if a variant has not been filtered the FILTER column should be empty for that variant:

##fileformat=VCFv4.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  GENOTYPE
chr3    4   .   G   C   2.2     ;VARIANT NOT ANALYSED - NO GENE ANNOTATIONS GT  0/1
chr1    1   .   A   T   2.2 PASS    ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr1    2   .   T   -   2.2 Target  ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr2    3   .   C   T   2.2 Frequency;Target    ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr3    5   .   G   C   2.2     ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1

However this has raised a question about the Filterable interface which I'd like some feedback on as it impacts a fundamental behaviour. Currently a Filterable either passes or fails, but really it can exist in one of three states - passed, failed and unfiltered/not yet filtered. Given this:

  1. Should a Filterable (VariantEvaluation or Gene) report true or false to passedFilters if no filters have been applied? (I think this should really be false - however, this is a reversal of the current behaviour which is always true until a filter has been failed and passedFilters is quite well used- a more accurate name would be hasNotFailedFilters )
  2. Should a Filterable be able to report its actual state - PASSED/FAILED/UNFILTERED? (I vote yes as it is quite explicit and prevents you from having to combine passedFilters and a newly added isUnFiltered Booleans in order to infer a missing failedFilters or add the failedFilters Boolean too)
  3. Given the first two points is there any point in having any other methods in the Filterable interface other than passedFilter(FilterType) and getFilterStatus?

It would be possible to keep the existing behaviour of passedFilters and add getFilterStatus for ease of use and backwards compatibility at the expense of some potential confusion. But this all depends on what people are using - will any of these changes actually have any direct impact on you? If not I suggest clean logic should be applied as this tends to keeps things simpler and simple is good.

holtgrewe commented 9 years ago

Hi Jules, as I am currently doing a refactoring of Jannovar, we should talk about this in detail, maybe in a Skype call?

Besides other changes, the new version uses the HTS-JDK for VCF I/O.

julesjacobsen commented 9 years ago

Sure thing - when is a good time for you? What is your Skype address.

julesjacobsen commented 9 years ago

Do you want something in the HTML too? I'd assume yes. I've not changed the Filterable interface as this keeps the change smaller, however I've migrated the discussion to issue #4.

pnrobinson commented 9 years ago

I think the HTML should have a section with some Q/C report and a list of the non-parsable variants, that is what is useful clinically! Thanks! Peter

julesjacobsen commented 9 years ago

HTML report is next up, Peter. I wanted to get this out quickly(ish) for Orion so he could continue with things.

julesjacobsen commented 9 years ago

This is now functional along with a newly styled HTML output. See release 5.2.0 in the next few minutes...

visze commented 9 years ago

I have a request for this topic from some guys outside the charité. They are using a local cli installation. They ask for a command line option like "--stop-on-annotation-error". So they can force the Option 2 in Jules initial comment on this issue.

julesjacobsen commented 9 years ago

How do they want this communicated to them? I can just terminate the analysis, throw a RuntimeException, log an error in the console and stop.... Can you ask please.

pnrobinson commented 9 years ago

How about a creating a log file for every analysis with some of the main parameters and -- if any -- the errors? I can't think of any situation where a user of the Exomiser would want it to just die if something goes wrong. -peter

visze commented 9 years ago

@pnrobinson If you want to use it for diagnostic reasons you have to be sure, that every variant in your set is annotated. Thats why they want to force a die of exomiser, if a variant cannot be annotated. If some variants are skipped and maybe only reported in a log file (or in the output html), the user will maybe overlook it.

@julesjacobsen I think an error log in the console with the missed variant will be perfect. So they can analyse this variant differently. For a new exomiser run they can remove the variant manually from the vcf. But I will ask them.

But it looks like that the JannovarAnnotationErrors will be reduced to a minimum in the actual jannovar development version.

julesjacobsen commented 8 years ago

TODO: Produce a VCF output of unannotated varinats if they're there and print a big f-off warning at the end of the run.