Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
23 stars 20 forks source link

Different output on same command and same input file, different library version #54

Closed BCArg closed 2 years ago

BCArg commented 2 years ago

I am running a variant calling pipeline to detect solid tumours and haematological alterations. The variant calling is done with Mutect2, whereas I am using an R script to perform some filtering on the VCF produced by Mutect

I noticed, however, that the output of the header command (VariantAnnotation library, R programming language) is not the same considering two versions (1.24.5 vs. 1.32.0).

When using v 1.32.0 I get

> header(input)
class: VCFHeader 
samples(1): Panel-STHT-T210986-PJR
meta(6): MutectVersion fileformat ... GATKCommandLine contig
fixed(1): FILTER
info(16): CONTQ DP ... STR TLOD
geno(13): GT AD ... SAAF SAPP

Whereas when using v 1.24.5 I get fixed(0): instead of fixed(1): FILTER, as shown above:

> header(input)
class: VCFHeader 
samples(1): Panel-STHT-T210986-PJR
meta(3): META GATKCommandLine contig
fixed(0):
info(16): CONTQ DP ... STR TLOD
geno(13): GT AD ... SAAF SAPP

I also noticed that if I continue with the filtering that the R script does (eventually writing a vcf with writeVcf), with VariantAnnotation v 1.32.0 I get an extra line in the vcf header: ##FILTER=All filters passed, which is upsetting another downstream program (the error message that I get is .vcf: Invalid FILTER line). And it appears to me that this ##FILTER=All filters passed line is actually coming from the extra line of the header command.

The easiest solution for me would be to downgrade from 1.32.0 to 1.24.5: However, I would like to understand where this extra line comes from

Session information (when using v1.32.0 of VariantAnnotation is shown below:

R version 3.6.1 (2019-07-05)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 8 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=de_BE.UTF-8           LC_COLLATE=en_GB.UTF-8       
 [5] LC_MONETARY=de_BE.UTF-8       LC_MESSAGES=en_GB.UTF-8      
 [7] LC_PAPER=de_BE.UTF-8          LC_NAME=de_BE.UTF-8          
 [9] LC_ADDRESS=de_BE.UTF-8        LC_TELEPHONE=de_BE.UTF-8     
[11] LC_MEASUREMENT=de_BE.UTF-8    LC_IDENTIFICATION=de_BE.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] httr_1.4.1                  stringr_1.4.0              
 [3] readr_1.3.1                 xlsx_0.6.3                 
 [5] VariantAnnotation_1.32.0    Rsamtools_2.2.3            
 [7] Biostrings_2.54.0           XVector_0.26.0             
 [9] SummarizedExperiment_1.16.1 DelayedArray_0.12.2        
[11] BiocParallel_1.20.1         matrixStats_0.55.0         
[13] Biobase_2.46.0              GenomicRanges_1.38.0       
[15] GenomeInfoDb_1.22.0         IRanges_2.20.2             
[17] S4Vectors_0.24.3            BiocGenerics_0.32.0        
[19] optparse_1.6.4             

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2               lattice_0.20-38          prettyunits_1.0.2       
 [4] xlsxjars_0.6.1           assertthat_0.2.1         zeallot_0.1.0           
 [7] digest_0.6.15            BiocFileCache_1.10.2     R6_2.2.2                
[10] backports_1.1.4          RSQLite_2.1.1            pillar_1.4.2            
[13] zlibbioc_1.32.0          rlang_0.4.0              GenomicFeatures_1.38.2  
[16] progress_1.2.2           curl_4.2                 blob_1.1.1              
[19] Matrix_1.2-17            RCurl_1.98-1.1           bit_1.1-14              
[22] biomaRt_2.42.0           compiler_3.6.1           rtracklayer_1.46.0      
[25] pkgconfig_2.0.3          askpass_1.1              openssl_1.4.1           
[28] tidyselect_0.2.5         tibble_2.1.3             GenomeInfoDbData_1.2.2  
[31] XML_3.98-1.11            crayon_1.3.4             dplyr_0.8.3             
[34] dbplyr_1.4.2             GenomicAlignments_1.22.1 bitops_1.0-6            
[37] rappdirs_0.3.1           grid_3.6.1               DBI_1.0.0               
[40] magrittr_1.5             stringi_1.4.3            getopt_1.20.3           
[43] vctrs_0.2.0              tools_3.6.1              bit64_0.9-7             
[46] BSgenome_1.54.0          glue_1.3.1               purrr_0.3.2             
[49] hms_0.5.1                AnnotationDbi_1.48.0     memoise_1.1.0           
[52] rJava_0.9-11
vjcitn commented 2 years ago

Thanks for this note. It is challenging to respond to events occurring in R 3.6. -- Is there any chance you could define the problem using the current release of Bioconductor? I understand that this concerns a change in behavior across versions of VariantAnnotation but it would be helpful if you would use the current release to define the concern with contemporary software.

vjcitn commented 2 years ago

If you can make the vcf file available it would also be helpful.

BCArg commented 2 years ago

unfortunately I cannot share the vcf. But we managed to find a workaround for this, so I am closing the issue