Closed LTLA closed 11 months ago
I would just add an NSBS,ANY
method in S4Vectors that accepts any array-like subscript i
with a single dimension, and replaces it with as.vector(i)
. Would handle 1D subscripts of other types like 1D SparseArray etc... Would that work?
Great that you're working on a ParquetDataFrame container!
Would that work?
Yes, I think that would be a good solution.
Added to S4Vectors 0.39.3: https://github.com/Bioconductor/S4Vectors/commit/15349ef40f141b16df6daf3e38f3782ef54eb60c
While writing https://github.com/LTLA/ParquetDataFrame, it occurred to me that it would be nice to use a 1-dimensional
DelayedArray
(containing integer indices or logical filters) for subsetting BioC data structures:Being able to do this would be convenient as my
ParquetDataFrame
returns 1-dimensional file-backedDelayedArray
s representing the columnar data. So, if aNSBS
method were available, it would allow users to do something like:... without having to remember to call
as.vector(keep)
before it goes intoNSBS
via theDataFrame
's[
.(Technically, it seems most appropriate to define a
NSBS
method for a hypotheticalDelayedVector
class that can be used as a subscripting vector, rather than pretending to have a vector via a 1-dimensional array. Certainly if aDelayedVector
were available, myParquetColumnVector
would just derive from it. Probably could make aSQLColumnVector
as well.)Session information
``` R version 4.3.1 Patched (2023-08-28 r85047) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS Matrix products: default BLAS: /home/luna/Software/R/R-4-3-branch/lib/libRblas.so LAPACK: /home/luna/Software/R/R-4-3-branch/lib/libRlapack.so; LAPACK version 3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DelayedArray_0.27.10 SparseArray_1.1.12 [3] S4Arrays_1.1.6 abind_1.4-5 [5] Matrix_1.6-1.1 SummarizedExperiment_1.31.1 [7] Biobase_2.61.0 GenomicRanges_1.53.1 [9] GenomeInfoDb_1.37.6 MatrixGenerics_1.13.1 [11] matrixStats_1.0.0 IRanges_2.35.2 [13] S4Vectors_0.39.2 BiocGenerics_0.47.0 loaded via a namespace (and not attached): [1] zlibbioc_1.47.0 lattice_0.21-9 GenomeInfoDbData_1.2.10 [4] XVector_0.41.1 RCurl_1.98-1.12 bitops_1.0-7 [7] grid_4.3.1 compiler_4.3.1 tools_4.3.1 [10] crayon_1.5.2 ```