connor-lab / ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019
GNU Affero General Public License v3.0
88 stars 89 forks source link

type_vcf.py doesn't handle variants with consequence 'stop_retained' #90

Closed dfornika closed 3 years ago

dfornika commented 3 years ago

Running type_vcf.py on the following variant causes an error on line 220:

REGION  POS     REF     ALT     REF_DP  REF_RV  REF_QUAL        ALT_DP  ALT_RV  ALT_QUAL        ALT_FREQ        TOTAL_DP        PVAL    PASS    GFF_FEATURE     REF_CODON       REF_AA  ALT_CODON     ALT_AA
MN908947.3      27386   A       G       0       0       0       11      3       36      1       11      2.83514e-06     TRUE    NA      NA      NA      NA      NA
Command error:
  Traceback (most recent call last):
    File "/home/dfornika/code/type-variants-nf/bin/type_vcf.py", line 375, in <module>
      sys.exit(main())
    File "/home/dfornika/code/type-variants-nf/bin/type_vcf.py", line 362, in main
      sample_vars = get_variant_summary(infos)
    File "/home/dfornika/code/type-variants-nf/bin/type_vcf.py", line 220, in get_variant_summary
      complete_aa_variant_string = aa_var['refaa'] + aa_var['refpos'] + aa_var['varaa']
  TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

That variant occurs in a synonymous change in a stop codon. The value of the variable variant for that input in this block of code:

https://github.com/connor-lab/ncov2019-artic-nf/blob/9ac3119a875d75c49de65848a3587e6fcec22d1c/bin/type_vcf.py#L204-L223

...is:

{
  "consequence": "stop_retained",
  "gene": "ORF6", 
  "transcript": "ENSSAST00005000011", 
  "biotype": "protein_coding", 
  "strand": "+", 
  "amino_acid_change": "62*", 
  "dna_change": "27386A>G"
}

The regular expression on this line:

https://github.com/connor-lab/ncov2019-artic-nf/blob/9ac3119a875d75c49de65848a3587e6fcec22d1c/bin/type_vcf.py#L206

...also does not seem to handle the value of variant['amino_acid_change'] ("62*"). It results in the variable aa_var having the value:

{
  "refpos": "62", 
  "refaa": None, 
  "varpos": None, 
  "varaa": "*"
}

Variants with consequence type stop_retained don't seem to be handled by type_vcf.py.