Bioconductor / DelayedArray

A unified framework for working transparently with on-disk and in-memory array-like datasets
https://bioconductor.org/packages/DelayedArray
24 stars 9 forks source link

More informative error than `the supplied seed must support extract_array()` #112

Open LTLA opened 6 months ago

LTLA commented 6 months ago

A recent change in the DelayedArray stack (don't know exactly where) is causing ScaledMatrix tests to break on Bioc-devel with the rather unhelpful message:

Error in `validObject(.Object)`: invalid class "ScaledMatrix" object: 
    the supplied seed must support extract_array()
Backtrace:
     ▆
  1. └─ScaledMatrix (local) spawn_extra_scenarios(100, 50) at test-mult.R:52:5
  2.   └─ScaledMatrix:::spawn_scenarios_basic(...) at test-mult.R:38:5
  3.     └─ScaledMatrix:::scale_and_center(y, ref, it) at tests/testthat/setup.R:38:13
  4.       └─ScaledMatrix::ScaledMatrix(y, center = center, scale = scale) at tests/testthat/setup.R:20:5
  5.         ├─DelayedArray::DelayedArray(...)
  6.         └─ScaledMatrix::DelayedArray(...)
  7.           └─DelayedArray::new_DelayedArray(seed, Class = "ScaledMatrix")
  8.             └─S4Vectors::new2(Class, seed = seed)
  9.               └─methods::new(...)
 10.                 ├─methods::initialize(value, ...)
 11.                 └─methods::initialize(value, ...)
 12.                   └─methods::validObject(.Object)

This doesn't make any sense because ScaledMatrix (via its seed) does, in fact, implement extract_array(). I assume there's a try being done somewhere to test whether extract_array() works, which is catching and hiding the real error message.

Session information ``` R Under development (unstable) (2023-11-10 r85507) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 20.04.6 LTS Matrix products: default BLAS: /home/luna/Software/R/trunk/lib/libRblas.so LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so; LAPACK version 3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DelayedArray_0.29.7 SparseArray_1.3.4 S4Arrays_1.3.4 [4] abind_1.4-5 IRanges_2.37.1 S4Vectors_0.41.3 [7] MatrixGenerics_1.15.0 matrixStats_1.2.0 BiocGenerics_0.49.1 [10] Matrix_1.6-5 ScaledMatrix_1.11.1 testthat_3.2.1 loaded via a namespace (and not attached): [1] vctrs_0.6.5 crayon_1.5.2 cli_3.6.2 rlang_1.1.3 [5] glue_1.7.0 fansi_1.0.6 brio_1.1.4 grid_4.4.0 [9] lifecycle_1.0.4 compiler_4.4.0 waldo_0.5.2 XVector_0.43.1 [13] rstudioapi_0.15.0 lattice_0.22-5 R6_2.5.1 utf8_1.2.4 [17] pillar_1.9.0 magrittr_2.0.3 tools_4.4.0 withr_3.0.0 [21] zlibbioc_1.49.0 desc_1.4.3 ```
LTLA commented 6 months ago

FWIW I eventually debugged it by modifying the DelayedArray source to eliminate the try() altogether in DelayedOp-class.R. Just slapping res onto the message wasn't enough, unfortunately, as I needed the traceback to figure out where it was coming from.

hpages commented 6 months ago

This is a true positive.

The unit tests in ScaledMatrix construct a ScaledMatrixSeed object for which extract_array() doesn't work (non-compliant seed). To reproduce:

library(ScaledMatrix)

## Define the CrippledMatrix class and all the methods defined
## in ScaledMatrix/tests/testthat/test-mult.R by copy-pasting
## lines 9-59 from the file.

## Then:

cm <- new("CrippledMatrix", x=matrix(runif(50), ncol=5))
seed <- ScaledMatrixSeed(cm)  # non-compliant seed!

SM <- DelayedArray(seed)
# Error in validObject(.Object) : invalid class “ScaledMatrix” object: 
#     the supplied seed must support extract_array()

This is a validity check that I added a few weeks ago in devel.

Trying to call extract_array() directly on this seed indeed fails:

extract_array(seed, list(1:3, 1:2))
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector

In release, SM <- DelayedArray(seed) works but the object is broken e.g. as.matrix() doesn't work on it:

as.matrix(SM)
# Error in h(simpleError(msg, call)) : 
#   error in evaluating the argument 'x' in selecting a method for function 'drop': no method for coercing this S4 class to a vector

or it can't be displayed:

show(SM)
# Error in h(simpleError(msg, call)) : 
#   error in evaluating the argument 'x' in selecting a method for function 'type': no method for coercing this S4 class to a vector

etc...

Purpose of this new validity check is to detect broken DelayedArray objects as early as possible i.e. at construction time.

Suggestions welcome to make the error message more informative.

H.

Session information ``` R Under development (unstable) (2024-01-02 r85764) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 23.10 Matrix products: default BLAS: /home/hpages/R/R-4.4.r85764/lib/libRblas.so LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ScaledMatrix_1.11.0 loaded via a namespace (and not attached): [1] zlibbioc_1.49.0 SparseArray_1.3.4 Matrix_1.6-5 [4] lattice_0.22-5 MatrixGenerics_1.15.0 matrixStats_1.2.0 [7] abind_1.4-5 S4Arrays_1.3.5 XVector_0.43.1 [10] BiocGenerics_0.49.1 stats4_4.4.0 IRanges_2.37.1 [13] grid_4.4.0 DelayedArray_0.29.9 compiler_4.4.0 [16] tools_4.4.0 S4Vectors_0.41.3 crayon_1.5.2 ```
LTLA commented 6 months ago

Yes, I'm fine with the motivation behind the check. It's just that the error message itself is not very informative.

It's unfortunate that we need to return a string in the validity method, because that loses the traceback.

I guess we could just not wrap it in a try and just expose the traceback to the user. Currently the user doesn't even know who to ask to fix the problem, as the failing matrix could be arbitrarily deep in a tree of delayed operations.