Bioconductor / VariantAnnotation

Annotation of Genetic Variants
https://bioconductor.org/packages/VariantAnnotation
23 stars 20 forks source link

extractROWS() on a CollapsedVCF instance returns an invalid object. #39

Closed mtmorgan closed 4 years ago

mtmorgan commented 4 years ago

@hpages VariantAnnotation fails in devel https://master.bioconductor.org/checkResults/3.11/bioc-LATEST/VariantAnnotation/malbec2-buildsrc.html because extractROWS() returns an invalid CollapsedVCF object. A simple example is

fl <- system.file("extdata", "ex2.vcf",  package="VariantAnnotation")
vcf <- readVcf(fl, genome="hg19")
validObject(extractROWS(vcf, 1:2))

leading to

Error in validObject(extractROWS(vcf, 1:2)) :
  invalid class "CollapsedVCF" object: 1: 'fixed(object)' and 'rowRanges(object) must have the same number of rows
invalid class "CollapsedVCF" object: 2: 'info' must have the same number of rows as 'rowRanges'

It seems like the solution is to add

diff --git a/R/methods-VCF-class.R b/R/methods-VCF-class.R
index b1305c2..ae8e4ce 100644
--- a/R/methods-VCF-class.R
+++ b/R/methods-VCF-class.R
@@ -296,6 +296,10 @@ setMethod("vcfFields", "VCF",
 ### Subsetting and combining
 ###

+setMethod(vertical_slot_names, "VCF", function(x) {
+    c("fixed", "info", callNextMethod())
+})
+
 setMethod("[", c("VCF", "ANY", "ANY"),
     function(x, i, j, ..., drop=TRUE)
 {

Does that mean the [ and [<- methods immediately following this can be removed?

LiNk-NY commented 4 years ago

FWIW, the package TVTB is experiencing issues related to this for ExpandedVCF based on what Kevin Rue-Albrecht has responded with:

there is no extractROWS method defined for VCF objects.

showMethods("extractROWS", classes = "ExpandedVCF")
Function: extractROWS (package S4Vectors)
x="ExpandedVCF", i="logical"
(inherited from: x="Vector", i="ANY")
x="ExpandedVCF", i="numeric"
(inherited from: x="Vector", i="ANY")

@hpages

mtmorgan commented 4 years ago

Use selectMethod() to find the method that is dispatched to

> selectMethod(extractROWS, "ExpandedVCF")
Method Definition:

function (x, i)
{
    x <- updateObject(x, check = FALSE)
    i <- normalizeSingleBracketSubscript(i, x, as.NSBS = TRUE)
    x_vslotnames <- vertical_slot_names(x)
    ans_vslots <- lapply(setNames(x_vslotnames, x_vslotnames),
        function(slotname) extractROWS(slot(x, slotname), i))
    do.call(BiocGenerics:::replaceSlots, c(list(x), ans_vslots,
        list(check = FALSE)))
}
<bytecode: 0x7fad35d7aa68>
<environment: namespace:S4Vectors>

Signatures:
        x             i
target  "ExpandedVCF" "ANY"
defined "Vector"      "ANY"

ExpandedVCF inherits from VCF, so the fix above would also work for ExpandedVCF...

The 'magic' is in the call to vertical_slot_names(), which is also a generic and implements the 'vertical slots' that need to be subset by i.

hpages commented 4 years ago

Seems related to my recent changes to SummarizedExperiment, sorry guys. I made these changes in preparation for a clean solution to the c() issue. See https://github.com/Bioconductor/SummarizedExperiment/issues/27 and https://github.com/Bioconductor/GenomicFiles/issues/4

@mtmorgan Thanks for taking care of this! I'm not familiar with VCF objects but yes, it seems that the fixed and info slots should be registered as vertical slots. Don't remove the [ and [<- methods for now. Even though we shouldn't need them in theory, they are doing strange things and I think we still need them for now. For the same reason I didn't remove the [ and [<- methods for SummarizedExperiment objects yet. Modernizing the subsetting code for SummarizedExperiment objects will be a slow process that will require several iterations.

kevinrue commented 4 years ago

@kevinrue tagging myself to follow the conversation for TVTB

mtmorgan commented 4 years ago

@hpages https://github.com/Bioconductor/SummarizedExperiment/blob/81d41a110c1115df6ad5b144c40e6d171ab13c2f/R/SummarizedExperiment-class.R#L84 changes

x = SimpleList(setNames(nm = list()))  # named simple list
assays(SummarizedExperiment(assays = x))      # NULL

Previously this was an endomorphism. Is this intentional?