Closed chrisamiller closed 5 years ago
CLE pipeline results are here:
/gscmnt/gc13015/cle/IDT_somatic_exome_assay/CI-398/H_MT-8043-005/
Is certainly relatively recent (includes somalier inputs, etc). CWL that was run is here:
/gscmnt/gc13015/cle/IDT_somatic_exome_assay/git/analysis-workflows/definitions/pipelines/gathered_cle_somatic_exome.cwl
It matches the current master branch
VEP fields in input.yaml seem sane at first glance:
vep_to_table_fields:
- Consequence
- SYMBOL
- Feature_type
- Feature
- HGVSc
- HGVSp
- cDNA_position
- CDS_position
- Protein_position
- Amino_acids
- Codons
- HGNC_ID
- Existing_variation
- gnomADe_AF
vep_cache_dir: /gscmnt/gc2560/core/cwl/inputs/VEP_cache
vep_ensembl_assembly: GRCh38
vep_ensembl_version: 95
vep_ensembl_species: homo_sapiens
@jhundal , I don't believe anything is wrong here. VEP is being run with the flag_pick
option, which means that it's choosing based on these criteria:
https://useast.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick
That means that for this case, it does, in fact choose a downstream variant as the most reliable annotation, but I don't think that's a failure.
Hi Dave and Feiyu,
Just wanted to note that the latest CLE somatic results do not seem to be reporting the most severe consequence in the variants.annotated.tsv. I only noticed this being an issue for indels.
This doesn't affect vaccine/immunotherapy pipelines b/c all vcf annotations are processed.
If anyone or any process is utilizing the variants.annotated.tsv it would have an effect.
Two examples of an in-frame indel and a frame-shift indel are below. These are reported as upstream variants and intronic variants in the tsv summary
Case directory:
########
$ zgrep FAM173A annotated_filtered.vcf.gz | cut -f 1-5
chr16 721358 . CGGCTCG C
Note VEP consequence: inframe_deletion|MODERATE|FAM173A|ENSG00000103254|Transcript|ENST00000569529.5|protein_coding
$ zgrep 721358 variants.annotated.tsv
chr16 721358 . CGGCTCG C mutect-varscan-pindel CGGCTCG/CGGCTCG 453,0 0 453 CGGCTCG/C 1402,354 0.20159 1756 downstream_gene_variant CCDC7Transcript ENST00000293889.10
########
$ zgrep OR51F2 annotated_filtered.vcf.gz | cut -f 1-5
chr11 4821935 . AGTTCTATG A
Note: frameshift_variant|HIGH|OR51F2|ENSG00000176925|Transcript|ENST00000641672.1|protein_coding|
$ zgrep 4821935 variants.annotated.tsv
Note VEP consequence: frameshift_variant|HIGH|OR51F2|ENSG00000176925|Transcript|ENST00000641672.1|protein_coding|
chr11 4821935 . AGTTCTATG A Intersection AGTTCTATG/AGTTCTATG 203,0 0 203 AGTTCTATG/A 220,40 0.15385 260 intronvariant MMP26 Transcript ENST00000380390.5 ENST00000380390.5:c.-145+54597-145+54604del HGNC:14249
########
-Mike M.