KarchinLab / open-cravat-modules-karchinlab

MIT License
0 stars 6 forks source link

Variants with multiple annotations #4

Closed chadisaad closed 8 months ago

chadisaad commented 4 years ago

When importing an annotated VCF file, variants that have multiple annotations (multiple transripts for instance) are merged in one line (one line per genomic position).

Annotations are merged and seperated by ';'. Annotation field types are then modified from (float -> string for example for gnomAD database values), and we cannot apply proper filters on these colomns.

Is it possible to keep a seperated line per annotation instead of merging annototions ?

rkimoakbioinformatics commented 4 years ago

Hi @chadisaad, when a VCF format input line is for example:

1 xxxxxx id1 T C,A

and when oc produces VCF format output line:

1 xxxxxx id1 T C,A ...CRV=...|0.193,0.254|...

do you mean getting 0.193 and 0.254 in separate lines?

In that case, -t text, -t tsv, -t csv, and etc. options will produce report files in text, tsv, and csv formats which have those numbers in separate lines. Would these work for you, or do you need VCF format reports where alternate alleles for the same position are separated into different lines (with 0.193 and 0.254 on different lines)?

chadisaad commented 4 years ago

Hi, I will reformat my question with an example:

If we have 1 variant with 2 VEP annotation (CSQ=ALLELE| Consequence| GENE| TRANSCRIPT| gnomad_AF),

we will have something like:

chr1 2222 ID A C CSQ=C|missense_variant|TP53|NM_XXXX|0.2,C|missense_variant|TP53|NM_YYYYY|0.2 In the web interface table (VARIANT tab), transcripts are regrouped together (NM_XXXX; NM_YYYYY). Same for gnomAD (0.2;0.2), which will prevent us from doing right filters on gnomAD (float numbers converted to strings because of the concatenation).

So is it possible, to have one line per CSQ annotation ? (do not concatenate fields using ';')

rkimoakbioinformatics commented 4 years ago

Hi Chadi, currently, OpenCRAVAT does not provide transcript-level output, although it may in the future. Filtering OpenCRAVAT output by a CSQ field is not well supported currently, as you described. Meanwhile, gnomAD is specific to genome position-reference allele-alternate allele sets, so different transcript should not give different numbers for the same position, ref, and alt. If you need is specifically gnomAD, OpenCRAVAT has gnomAD annotation modules (gnomAD 2, gnomAD 3, and gnomAD Gene), and they give variant- and gene-specific gnomAD annotation. It may be worth giving OpenCRAVAT's gnomAD modules a shot?

chadisaad commented 4 years ago

Yes, for instance I can annotate with opencravat's gnomAD modules. But, it was just an example, for a more general problem, that we have for all annotations. We have custom annotations databases that we use with VEP to annotate our VCFs, and we cannot do it with oc