Open LTLA opened 6 months ago
@hpages are you familiar with expand()?
I have some changes that address this, am testing them now.
Addressed in git.bioconductor.org devel branch with e3103a
also in github ... presumably this change should be ported back to RELEASE_3_18? @mtmorgan ?
It seems reasonable that per-allele
##INFO
fields should be converted fromNumber=A
toNumber=1
whenexpand()
is called, given that the 1:many relationship between rows and allelic values is now flattened into a 1:1 mapping.This has practical consequences, too, as we can see:
We can ignore mismatches 1-4, as these are inconsequential to most end-users (albeit annoying to developers, see #78). We can also ignore mismatch 8, which is discussed in #78. The interesting discrepancies are that of 5-7, where
AF
becomes aNumericList
after a roundtrip through the VCF file. This is because its##info
is still registeringNumber=A
but should really beNumber=1
to match the fact that it's already been flattened by theexpand()
call to generatefirst
.Session information
``` R Under development (unstable) (2023-11-29 r85646) Platform: aarch64-apple-darwin22.5.0 Running under: macOS Ventura 13.6.1 Matrix products: default BLAS: /Users/luna/Software/R/trunk/lib/libRblas.dylib LAPACK: /Users/luna/Software/R/trunk/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] VariantAnnotation_1.49.2 Rsamtools_2.19.2 [3] Biostrings_2.71.1 XVector_0.43.0 [5] SummarizedExperiment_1.33.1 Biobase_2.63.0 [7] GenomicRanges_1.55.1 GenomeInfoDb_1.39.5 [9] IRanges_2.37.0 S4Vectors_0.41.3 [11] MatrixGenerics_1.15.0 matrixStats_1.2.0 [13] BiocGenerics_0.49.1 loaded via a namespace (and not attached): [1] KEGGREST_1.43.0 rjson_0.2.21 lattice_0.22-5 [4] vctrs_0.6.5 tools_4.4.0 bitops_1.0-7 [7] generics_0.1.3 curl_5.2.0 parallel_4.4.0 [10] tibble_3.2.1 fansi_1.0.6 AnnotationDbi_1.65.2 [13] RSQLite_2.3.4 blob_1.2.4 pkgconfig_2.0.3 [16] Matrix_1.6-4 BSgenome_1.71.1 dbplyr_2.4.0 [19] lifecycle_1.0.4 GenomeInfoDbData_1.2.11 compiler_4.4.0 [22] stringr_1.5.1 progress_1.2.3 codetools_0.2-19 [25] yaml_2.3.8 RCurl_1.98-1.13 pillar_1.9.0 [28] crayon_1.5.2 BiocParallel_1.37.0 DelayedArray_0.29.0 [31] cachem_1.0.8 abind_1.4-5 tidyselect_1.2.0 [34] digest_0.6.33 stringi_1.8.3 restfulr_0.0.15 [37] dplyr_1.1.4 biomaRt_2.59.0 fastmap_1.1.1 [40] grid_4.4.0 cli_3.6.2 SparseArray_1.3.2 [43] magrittr_2.0.3 S4Arrays_1.3.1 GenomicFeatures_1.55.1 [46] XML_3.99-0.16 utf8_1.2.4 rappdirs_0.3.3 [49] filelock_1.0.3 prettyunits_1.2.0 bit64_4.0.5 [52] httr_1.4.7 bit_4.0.5 png_0.1-8 [55] hms_1.1.3 memoise_2.0.1 BiocIO_1.13.0 [58] BiocFileCache_2.11.1 rtracklayer_1.63.0 rlang_1.1.2 [61] glue_1.6.2 DBI_1.2.0 xml2_1.3.6 [64] R6_2.5.1 GenomicAlignments_1.39.0 zlibbioc_1.49.0 ```