Closed lmweber closed 3 years ago
I did some more tests and seems this happens whenever a new column is added to colData
. So when a new column is added to colData
, an additional duplicate copy of the spatialData
columns are automatically added at the end.
I think the colData(spe, spatialData = FALSE)
idea would most likely solve it.
From a user perspective, you could then still use colData(spe, spatialData = TRUE, spatialCoords = TRUE)
to pipe to ggplot
for plotting.
Thanks Lukas,
we surely need to talk about this!
Ok as discussed today, I tried to add a colData
show method with these additional arguments. However this isn't working yet - I think due to NAMESPACE issues and the way it interacts with the SummarizedExperiment
colData
show method.
@drighelli could you please have a look at what I have done here, and let me know if you can see what I have done wrong? I have also tried removing @importFrom SummarizedExperiment colData
in other functions for the NAMESPACE.
The code is in this commit: https://github.com/drighelli/SpatialExperiment/commit/64dd16573f5c47a88108b569bbf5d10c95b1bbce
Here is also a minimal example showing the duplicated columns issue:
library(SpatialExperiment)
example(read10xVisium)
colData(spe)
colData(spe)$testing <- rep(1, ncol(spe))
colData(spe) ## shows duplicated spatialData columns
Thanks Lukas,
I don't know why, but I'm getting a weird error when creating an SpE with your code, but I found the problem in our master branch.
This is the error: it comes up when I try to create a SpE also if I start from the master branch and add your code again...
(spe <- read10xVisium(samples, sample_ids, type = "sparse", data = "raw", images = "lowres", load = FALSE)) Error: C stack usage 7969600 is too close to the limit
The problem is due to this assignment, so I think that we need to patch this.
We store the spatialData
and cbind
them to the previously stored colData
...
ok the error can be solved by using the x@colData
instead of colData(x)
, if you call colData(x) inside the colData
method, it becomes a recursive functions and it goes in stack overflow.
This is due to the fact that the colData
method is invoked during the construction of the object.
The problem now is to avoid problems, I'm getting errors when trying to retrieve the spatialData
because it doesn't store them into the colData
I wonder if it should be callNextMethod()
rather than x@colData
...
To elaborate a bit: callNextMethod()
inside the colData()
method definition should call the colData
method for SingleCellExperiment
, which obviously does not return the spatialData.
the problem is that the spatialData
are stored inside the colData
, so the callNextMethod
will return thex@colData
that contains the spatialData
Anyway, I tried and it doesn't work :)
But what about calling callNextMethod
instead of colData
here:
https://github.com/drighelli/SpatialExperiment/blob/64dd16573f5c47a88108b569bbf5d10c95b1bbce/R/SpatialExperiment-colData.R#L115
I'm just guessing, but since you mentioned that recursion is the problem...
Thanks Davide, I made it locally, it solves the recursion problem, but still doesn't solve the colData duplication...
Great, thanks for finding the recursion error!
I will think some more about whether this solution can be used to fix the duplicated columns issue (e.g. by adding spatialData = TRUE
or FALSE
wherever colData()
is called in other functions), or whether we need some other solution.
Hmm I just tested this, and it doesn't seem to work. When I do the following, I still get duplicated columns from spatialData
:
example(read10xVisium)
colData(spe)
colData(spe)$testing <- 1
colData(spe)
Output before adding new column:
> colData(spe)
DataFrame with 99 rows and 4 columns
sample_id in_tissue array_row array_col
<character> <logical> <integer> <integer>
AAACAACGAATAGTTC-1 section1 FALSE 0 16
AAACAAGTATCTCCCA-1 section1 TRUE 50 102
AAACAATCTACTAGCA-1 section1 TRUE 3 43
AAACACCAATAACTGC-1 section1 TRUE 59 19
AAACAGAGCGACTCCT-1 section1 TRUE 14 94
... ... ... ... ...
Output after adding new column:
> colData(spe)
DataFrame with 99 rows and 8 columns
sample_id in_tissue array_row array_col testing in_tissue array_row array_col
<character> <logical> <integer> <integer> <numeric> <logical> <integer> <integer>
AAACAACGAATAGTTC-1 section1 FALSE 0 16 1 FALSE 0 16
AAACAAGTATCTCCCA-1 section1 TRUE 50 102 1 TRUE 50 102
AAACAATCTACTAGCA-1 section1 TRUE 3 43 1 TRUE 3 43
AAACACCAATAACTGC-1 section1 TRUE 59 19 1 TRUE 59 19
AAACAGAGCGACTCCT-1 section1 TRUE 14 94 1 TRUE 14 94
... ... ... ... ... ... ... ... ...
However when I change the new block of code to check spatialDataNames
instead of spatialCoordsNames
I think it works.
Will test it some more and push some additional commits into this branch.
I think this commit fixes it: 62014b389dc03a23168de33bed5f7a30d1c0606f
I added a unit test too: 80436b886ad56dda15a6695016b00bed043a24e3
We are currently getting duplicated columns in
colData
when adding new columns incolData
using external tools such asscater::addPerCellQC
.This is because
colData
includes all thespatialData
columns at the end by default. So these are then added a second time when new columns have been added withscater::addPerCellQC
or other tools.My hack to remove them right now is to run the following afterwards:
which removes all the
spatialData
columns fromcolData
, and then they are automatically re-added again one final time at the end.I will think about how best to address this, and we can discuss next week.
One idea could be to have an argument
colData(spe, spatialData = FALSE)
by default, so thespatialData
columns are only returned insidecolData
if the user wants them. If they are off by default, then I think they won't be duplicated.We could also have two arguments
colData(spe, spatialData = FALSE, spatialCoords = FALSE)
for completeness.