Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
152 stars 46 forks source link

Failing to upload variants from TRGT 1.2.0 #5064

Closed fellen31 closed 4 days ago

fellen31 commented 4 days ago

Hi, trying to upload STRs called and merged with TRGT 1.2.0 and annotated with stranger 0.9.2.

File: /home/proj/development/rare-disease/felix/scout_test/NIST_repeats_annotated.vcf.gz

2024-11-25 08:55:35 hasta.scilifelab.se scout.adapter.mongo.variant_loader[202022] ERROR unexpected error
Traceback (most recent call last):
  File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/adapter/mongo/variant_loader.py", line 730, in load_variants
    nr_inserted = self._load_variants(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/adapter/mongo/variant_loader.py", line 436, in _load_variants
    parsed_variant = parse_variant(
                     ^^^^^^^^^^^^^^
  File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/parse/variant/variant.py", line 126, in parse_variant
    parsed_variant["samples"] = get_samples(variant, individual_positions, case)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/parse/variant/variant.py", line 358, in get_samples
    return parse_genotypes(variant, case["individuals"], individual_positions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/parse/variant/genotype.py", line 43, in parse_genotypes
    genotypes.append(parse_genotype(variant, ind, pos))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/parse/variant/genotype.py", line 107, in parse_genotype
    (_, mc_alt) = _parse_format_entry_trgt_mc(variant, pos)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/proj/stage/bin/miniconda3/envs/S_scout/lib/python3.11/site-packages/scout/parse/variant/genotype.py", line 498, in _parse_format_entry_trgt_mc
    pathologic_counts = int(allele)
                        ^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '.'
northwestwitch commented 4 days ago

Hi @fellen31, could you post the VCF line relative to a variant that triggers the error? Thanks!

fellen31 commented 4 days ago

Should probably be this line, where MC is .

chrX 149631763 . CGCCGCCGCCGCCGCCGCCC . 0 . TRID=TMEM185A;END=149631782;MOTIFS=GCC;STRUC=(GCC)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.: .:.:.:.

dnil commented 4 days ago

I found the file on hasta @northwestwitch!

dnil commented 4 days ago

One could argue we should patch TRGT/Stranger since the count is really "0", but I guess "." should be valid for a VCF anyway. Let's fix it here.