dgomezpere / msm_tfm

Development of an application to visualize, annotate and prioritize somatic variants in cancer
0 stars 0 forks source link

[vcf_pipeline%qc] Create VT report data parsers #30

Closed dgomezpere closed 2 years ago

dgomezpere commented 2 years ago

VT decompose

Report example (input)

$ less vt_decompose_report.txt

decompose v0.5

options:     input VCF file        /opt/msm_tfm/test_data/20200908_GISTomics_chr22_variants.vcf.gz
         [s] smart decomposition   true (experimental)
         [o] output VCF file       -

stats: no. variants                 : 17639
       no. biallelic variants       : 17639
       no. multiallelic variants    : 0

       no. additional biallelics    : 0
       total no. of biallelics      : 17639

Time elapsed: 0.80s

Wanted data structure (output in JSON format)

- version: str = "vt_decompose_v0.5"
- options: dict
    - input_vcf: str = "/opt/msm_tfm/test_data/20200908_GISTomics_chr22_variants.vcf.gz"
    - output_vcf:  str = "-"
    - smart_decomposition: str = "true (experimental)"
- stats: dict
    - n_variants: int = 17639
    - n_biallelic_variants: int = 17639
    - n_multiallelic_variants: int = 0
    - n_additional_biallelic_variants: int = 0
    - total_biallelic_variants: int = 17639
- time_elapsed: datetime = 0.80s

VT normalize

Report example (input)

$ less vt_normalize_report.txt

normalize v0.5

options:     input VCF file                                  vcf/20200908_GISTomics_chr22_variants.decomp.vcf.gz
         [o] output VCF file                                 -
         [w] sorting window size                             100000
         [m] no fail on masked reference inconsistency       false
         [n] no fail on reference inconsistency              true
         [q] quiet                                           false
         [d] debug                                           false
         [r] reference FASTA file                            /opt/msm_tfm/references/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa

[E::idx_find_and_load] Could not retrieve index file for 'vcf/20200908_GISTomics_chr22_variants.decomp.vcf.gz'

stats: biallelic
          no. left trimmed                      : 0
          no. right trimmed                     : 0
          no. left and right trimmed            : 0
          no. right trimmed and left aligned    : 0
          no. left aligned                      : 0

       total no. biallelic normalized           : 0

       multiallelic
          no. left trimmed                      : 0
          no. right trimmed                     : 0
          no. left and right trimmed            : 0
          no. right trimmed and left aligned    : 0
          no. left aligned                      : 0

       total no. multiallelic normalized        : 0

       total no. variants normalized            : 0
       total no. variants observed              : 17639
       total no. reference observed             : 0

Time elapsed: 0.93s

Wanted data structure (output JSON format)

- version: str = "vt_decompose_v0.5"
- options: dict
    - input_vcf: str = "/opt/msm_tfm/test_data/20200908_GISTomics_chr22_variants.vcf.gz"
    - output_vcf:  str = "-"
    - smart_decomposition: str = "true (experimental)"
- stats: dict
    - n_biallelic_left_trimmed: int = 0
    - n_biallelic_right_trimmed: int = 0
    - n_biallelic_left_right_trimmed: int = 0
    - n_biallelic_left_right_aligned: int = 0
    - n_biallelic_left_aligned: int = 0
    - total_biallelic_normalized: int = 0
    - n_multiallelic_left_trimmed: int = 0
    - n_multiallelic_right_trimmed: int = 0
    - n_multiallelic_left_right_trimmed: int = 0
    - n_multiallelic_left_right_aligned: int = 0
    - n_multiallelic_left_aligned: int = 0
    - total_multiallelic_normalized: int = 0
    - total_variants_normalized: int = 0
    - total_variants_observed: int = 0
    - total_reference_observed: int = 0
- time_elapsed: datetime = 0.93s
segarmond commented 2 years ago

Check input and output filepath. Consider using report outside dispatcher function to name output file