Closed LTLA closed 5 years ago
This seems to be the fault of all.equal.envRefClass()
in base, which ignores the .xData
slot of the two ShallowSimpleListAssays
objects being compared. I presume that this is because .xData
typically refers to external data resources that cannot be directly compared - but if so, why is ShallowSimpleListAssays
implemented like that? It doesn't really have "external resources" in the same sense.
The reason all.equal.envRefClass()
ignores @.xData
is because the methods package uses that slot as the environment for storing the actual fields of the object. The fields are checked separately.
The problem is that ShallowSimpleListAssays defines its own method on [[()
for getting elements of the list, but the methods package assumes that [[()
accesses fields on the object by name. I'm not sure whether this restriction is documented anywhere, but if you think about it, it would be best to avoid inconsistency with the behavior of $()
. ShallowSimpleListAssays probably should be composed of a reference class object, rather than being one itself.
Do we still need ShallowSimpleListAssays since R is shallow copy by default? I guess this is for compatibility with serialized instances.
I implemented the [[
method for ShallowSimpleListAssays objects. I've no idea if the fact that implementing a [[
method for a ref class breaks basic functionalities like all.equal()
is documented somewhere. It didn't even occur to me that such a restriction could exist (I'm naive and still haven't learned what the methods package is capable of after all these years).
Anyway, looks like reference classes as implemented in the methods package are a bit problematic (https://stat.ethz.ch/pipermail/bioc-devel/2019-May/015112.html, too problematic for my taste) so maybe the recommendation should be to avoid them. Especially since they are often used where they shouldn't (e.g. TxDb objects). In the case of the assays slot of an SE object they were introduced a long time ago by the father of the SE class and IIRC this was a trick to avoid triggering copies of the assays slot when the user does things like setting the rownames or colnames on the SE object. It's probably not needed anymore now that R is "shallow copy by default" but yeah there are hundreds (if not thousands) serialized instances around so we need to be careful when changing this.
The good news is that the assays slot only needs to be an "Assays" object and Assays is a virtual class with no slot:
showClass("Assays")
# Virtual Class "Assays" [package "SummarizedExperiment"]
#
# No Slots, prototype of class "SimpleListAssays"
#
# Known Subclasses: "SimpleListAssays", "ShallowSimpleListAssays", "AssaysInEnv"
ShallowSimpleListAssays is only 1 concrete subclass among others and is the one currently used by the SummarizedExperiment()
constructor to wrap the assays passed to it. Should be easy to change this to SimpleListAssays which a direct SimpleList extension. Serialized instances that use ShallowSimpleListAssays should keep working.
I'll work on this, including replacing the [[
methods for the various Assays subclasses with getListElement
methods. Will also start thinking about an all.equal()
method for Assays derivatives that doesn't return false positives or false negatives. Sounds like a must! It will be expensive when the assay data is on disk but hey, the end user is asking for a deep comparison so should be ready for that.
SummarizedExperiment 1.15.1 should eliminate Aaron's unpleasant little surprise (see commit 1c9719593f1a5fcb3ebe1c8cff2029c6039445e0).
I've opened 2 separate issues (#21 and #22) for the other things discussed here.
Closing this issue (feel free to re-open if I missed something).
H.
Thanks Herve... now, let's see how many of my (now correctly working) unit tests break!
This was a rather unpleasant little surprise:
I assume that this is due to some voodoo with environments inside
assays
. But this is worrying as it puts into question all of my unit tests across packages that work onSummarizedExperiment
s or their subclasses.It seems that
all.equal
is a S3 generic, so perhaps a special method could be defined for the relevant inner class of the assays field (probablyShallowSimpleListAssays
) to achieve the correct behaviour?Session information
``` R Under development (unstable) (2018-11-02 r75535) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS Matrix products: default BLAS: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRblas.so LAPACK: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] SummarizedExperiment_1.13.0 DelayedArray_0.9.0 [3] BiocParallel_1.17.0 matrixStats_0.54.0 [5] Biobase_2.43.0 GenomicRanges_1.35.0 [7] GenomeInfoDb_1.19.0 IRanges_2.17.0 [9] S4Vectors_0.21.0 BiocGenerics_0.29.1 loaded via a namespace (and not attached): [1] lattice_0.20-38 bitops_1.0-6 grid_3.6.0 [4] zlibbioc_1.29.0 XVector_0.23.0 Matrix_1.2-15 [7] tools_3.6.0 RCurl_1.95-4.11 compiler_3.6.0 [10] GenomeInfoDbData_1.2.0 ```