drisso / SingleCellExperiment

Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.
63 stars 17 forks source link

`colPairs(x, asSparse = TRUE)` fails when there are no metadata columns #73

Closed lazappi closed 3 months ago

lazappi commented 4 months ago

If colPairs() contains a SelfHits object with no metadata columns then colPairs(x, asSparse = TRUE) fails with a cryptic error. Example based on the documentation:

library(SingleCellExperiment)

example(SingleCellExperiment, echo=FALSE)
hits <- SelfHits(
    sample(ncol(sce), 10),
    sample(ncol(sce), 10),
    nnode=ncol(sce)
)

colPair(sce, "regulators") <- hits
# Fails with a cryptic error
colPairs(sce, asSparse = TRUE)
#> Error in if (ncol(m)) { : argument is of length zero

hits_with_values <- hits
mcols(hits_with_values)$value <- runif(10)
colPair(sce, "regulators") <- hits_with_values
# Works without an error
colPairs(sce, asSparse = TRUE)

I suppose this makes sense as it is unclear what to fill the matrix with but it would be helpful to have a more informative error. Or maybe assume the hits are an unweighted graph and fill the matrix with 1s with a warning?

Related, if there is more than one metadata column, then the first one is used. Maybe it would be good to give a warning in this case?

LTLA commented 3 months ago

Or maybe assume the hits are an unweighted graph and fill the matrix with 1s with a warning?

Seems reasonable to me. Also could create an lgCMatrix.

Related, if there is more than one metadata column, then the first one is used. Maybe it would be good to give a warning in this case?

May or may not be reasonable. assay() and other SE-related getters assume first argument without any noise. I guess we could have an argument to choose between metadata columns.

Can you put together a little PR for each of these suggestions?

lazappi commented 3 months ago

Creating a PR for the first one. I'm not sure if there is a nice way to pass arguments for the second one, but maybe it's fine.

LTLA commented 3 months ago

I'll take the PR since it's already been put together, but longer term, I am wondering whether it would be better to (i) deprecate the asSparse= and (ii) export the .hits2mat function (and its counterpart for converting a matrix to a Hits object). This would simplify the interface; the SCE only cares about Hits going in and out, and exactly how this is converted to/from a sparse matrix is up to the user/application outside of the SCE.

I don't ever use the colPairs and rowPairs functionality so I have no relevant experience/opinion on this matter, but happy to take thoughts from people who do use it.

lazappi commented 3 months ago

I have only used them in the context of converting to/from AnnData which has matching slots but I think SpatialExperiment also uses them (or something similar) which is how the issue came up.

LTLA commented 3 months ago

Alright, this should roll out in 1.25.1. Will leave the more general question of as.sparse= deprecation for another time.