Closed sjspielman closed 11 months ago
I reserve the right to have more thoughts later, but I think we are on the right track.
🎉 I'll take it!
I am totally stuck on a failing test that I cannot figure out after https://github.com/AlexsLemonade/scpcaTools/pull/251/commits/bc6f82e1a2ec036f2c4c17d2fbb096fb9fda831c -
── Error (test-merge_sce_list.R:563:3): merging SCEs where 1 altExp is missing works as expected, with altexps ──
Error in `value[[3L]](cond)`: failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
failed to rbind column 'altExps' across DataFrame objects: failed to
rbind column 'adt' across DataFrame objects: subscript contains invalid
I've tracked down the specific error back to this part of the combine
code -
[tryCatch](https://rdrr.io/r/base/conditions.html)({
int_cd <- [do.call](https://rdrr.io/r/base/do.call.html)([rbind](https://rdrr.io/r/base/cbind.html), [lapply](https://rdrr.io/r/base/lapply.html)([args](https://rdrr.io/r/base/args.html), [int_colData](https://rdrr.io/bioc/SingleCellExperiment/man/internals.html)))
}, error=[function](https://rdrr.io/r/base/function.html)(err) {
[stop](https://rdrr.io/r/base/stop.html)(
"failed to combine 'int_colData' in 'cbind(<", [class](https://rdrr.io/r/base/class.html)([args](https://rdrr.io/r/base/args.html)[[](https://rdrr.io/r/base/Extract.html)[1]]), ">)':\n ",
[conditionMessage](https://rdrr.io/r/base/conditions.html)(err))
})
But I can't figure out why this error is being thrown. I am able to run rbind()
directly on the colData
slots, and there aren't any columns here called "adt"
or "altExps"
... Time for me to take a little break from this code and circle back fresh. Any help spelunking to track down why this is being thrown is appreciated too.
EDIT 😭 make it make sense...
(sce_list[[4]]
is the one created with NA
s)
> do.call(cbind, list(sce_list[[1]], sce_list[[4]]))
class: SingleCellExperiment
dim: 12 16
metadata(8): library_id sample_id ... library_metadata sample_metadata
assays(2): counts logcounts
rownames(12): GENE0001 GENE0002 ... GENE0011 GENE0012
rowData names(3): gene_names sce1-other_column sce4-other_column
colnames(16): sce1-TCGACTTTTTTT sce1-GTTATTGAGCCG ... sce4-GTTATGGTCACT sce4-GGGAACCCTCCA
colData names(7): sum detected ... batch cell_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(1): adt
> do.call(cbind, list(sce_list[[1]], sce_list[[2]], sce_list[[4]]))
Error in value[[3L]](cond) :
failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
failed to rbind column 'altExps' across DataFrame objects: failed to rbind column 'adt' across DataFrame objects:
subscript contains invalid names
Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break.
Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right.
Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break. Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right.
💯 all around. I tried more and the "edit" I added in that comment above blew my mind, so I do need to actually take that break today...!
Noting https://github.com/AlexsLemonade/scpcaTools/pull/251/commits/e4a7399ea4c2d686e13b0fc02e3b8371c008b8ff is passing locally!
Closes #150
We're getting close I think! Noting that I've set this to close out #150, since I think we should open further issues to tackle potential (likely?) next steps (e.g. #250 and/or doing more things with altExp rowdata/coldata/metadata).
First, some good news: It appears that we do not need to create dummy rowData or colData for any new altExps that get created, which is excellent. Notably, I've reached a point where I seriously need to refresh my eyeballs from this code, but I am 100% sure we would benefit from additional tests. Please let me know what test low-hanging-fruit I've missed! I know we still need a test for SCEs with multiple altexps, but wanted to get this out for review now since, as mentioned, my eyes need a break before writing more things!
check_altexps()
to check for altExp compatibility and die otherwise. This function also returns a list which contains the expected features and assays that need to go into any new altExps that get created, and gets used down the line. Seemed prudent to just save this information while I was checking altExps anyways.TODO
in there for a future-PR next step of subsetting assays rather than dying if any are missing.prepare_altexp_for_merge()
does the altExp preparation, and it calls two other functions -build_na_matrix()
(called as needed for any missing altExps)prepare_sce_for_merge()
. I added an argument to this functionis_altexp
with a default ofFALSE
. When we process altExps, this gets set toTRUE
. This argument controls whether sce colnames should get a name prefix added; we don't want to do this if altExp is true since we'd end up with duplicate name prefixes in the end. I made this argument more general in case there are other aspects of this function we want to skip for altExps.Edit - one additional note for review. You probably want to use
split
(notunified
) view, since git's detection of diffs is a yet another 🎢 .