Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
23 stars 20 forks source link

`expand()` and non-standard rowRanges columns #85

Open jayoung opened 1 month ago

jayoung commented 1 month ago

hi there,

is there a way to retain non-standard rowRanges columns when I use expand() on a CollapsedVCF object? Some demo code below.

If not, please can you consider this an enhancement request? thanks!

all the best,

Janet

library(VariantAnnotation)

vcf <- VCF(rowRanges = GRanges("chr1", IRanges(1:4*3, width=c(1, 2, 1, 1))))
alt(vcf) <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
ref(vcf) <- DNAStringSet(c("G", c("AA"), "T", "G"))

## add some non-standard columns to rowRanges
mcols(rowRanges(vcf))$SNP_name <- paste("SNP_", 1:length(vcf), sep="")
mcols(rowRanges(vcf))$num_alts <- elementNROWS(alt(vcf))

## take a look
rowRanges(vcf)

GRanges object with 4 ranges and 6 metadata columns:
      seqnames    ranges strand |    SNP_name  num_alts            REF                ALT      QUAL
         <Rle> <IRanges>  <Rle> | <character> <integer> <DNAStringSet> <DNAStringSetList> <numeric>
  [1]     chr1         3      * |       SNP_1         1              G                  A        NA
  [2]     chr1       6-7      * |       SNP_2         1             AA                 TT        NA
  [3]     chr1         9      * |       SNP_3         2              T                G,A        NA
  [4]     chr1        12      * |       SNP_4         2              G               TT,C        NA
           FILTER
      <character>
  [1]        <NA>
  [2]        <NA>
  [3]        <NA>
  [4]        <NA>
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

## those extra columns are lost when we expand
vcfLong <- expand(vcf)
rowRanges(vcfLong)
rowRanges(vcfLong)

GRanges object with 6 ranges and 4 metadata columns:
      seqnames    ranges strand |            REF            ALT      QUAL      FILTER
         <Rle> <IRanges>  <Rle> | <DNAStringSet> <DNAStringSet> <numeric> <character>
  [1]     chr1         3      * |              G              A        NA        <NA>
  [2]     chr1       6-7      * |             AA             TT        NA        <NA>
  [3]     chr1         9      * |              T              G        NA        <NA>
  [4]     chr1         9      * |              T              A        NA        <NA>
  [5]     chr1        12      * |              G             TT        NA        <NA>
  [6]     chr1        12      * |              G              C        NA        <NA>
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
jayoung commented 1 month ago

here's my sessionInfo():

R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] VariantAnnotation_1.50.0    Rsamtools_2.20.0            Biostrings_2.72.1          
 [4] XVector_0.44.0              SummarizedExperiment_1.34.0 Biobase_2.64.0             
 [7] GenomicRanges_1.56.1        GenomeInfoDb_1.40.1         IRanges_2.38.1             
[10] S4Vectors_0.42.1            MatrixGenerics_1.16.0       matrixStats_1.3.0          
[13] BiocGenerics_0.50.0        

loaded via a namespace (and not attached):
 [1] SparseArray_1.4.8        bitops_1.0-7             RSQLite_2.3.7            lattice_0.22-6          
 [5] grid_4.4.1               fastmap_1.2.0            blob_1.2.4               jsonlite_1.8.8          
 [9] Matrix_1.7-0             AnnotationDbi_1.66.0     restfulr_0.0.15          DBI_1.2.3               
[13] httr_1.4.7               BSgenome_1.72.0          UCSC.utils_1.0.0         XML_3.99-0.17           
[17] codetools_0.2-20         abind_1.4-5              cli_3.6.3                rlang_1.1.4             
[21] crayon_1.5.3             bit64_4.0.5              yaml_2.3.9               cachem_1.1.0            
[25] DelayedArray_0.30.1      GenomicFeatures_1.56.0   S4Arrays_1.4.1           tools_4.4.1             
[29] parallel_4.4.1           BiocParallel_1.38.0      memoise_2.0.1            GenomeInfoDbData_1.2.12 
[33] curl_5.2.1               vctrs_0.6.5              R6_2.5.1                 png_0.1-8               
[37] BiocIO_1.14.0            rtracklayer_1.64.0       zlibbioc_1.50.0          KEGGREST_1.44.1         
[41] bit_4.0.5                GenomicAlignments_1.40.0 rjson_0.2.21             compiler_4.4.1          
[45] RCurl_1.98-1.16