AlexsLemonade / scpcaTools

BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Update altexp merging #251

Closed sjspielman closed 11 months ago

sjspielman commented 11 months ago

Closes #150

We're getting close I think! Noting that I've set this to close out #150, since I think we should open further issues to tackle potential (likely?) next steps (e.g. #250 and/or doing more things with altExp rowdata/coldata/metadata).

First, some good news: It appears that we do not need to create dummy rowData or colData for any new altExps that get created, which is excellent. Notably, I've reached a point where I seriously need to refresh my eyeballs from this code, but I am 100% sure we would benefit from additional tests. Please let me know what test low-hanging-fruit I've missed! I know we still need a test for SCEs with multiple altexps, but wanted to get this out for review now since, as mentioned, my eyes need a break before writing more things!

Edit - one additional note for review. You probably want to use split (not unified) view, since git's detection of diffs is a yet another 🎢 .

sjspielman commented 11 months ago

I reserve the right to have more thoughts later, but I think we are on the right track.

🎉 I'll take it!

sjspielman commented 11 months ago

I am totally stuck on a failing test that I cannot figure out after https://github.com/AlexsLemonade/scpcaTools/pull/251/commits/bc6f82e1a2ec036f2c4c17d2fbb096fb9fda831c -

── Error (test-merge_sce_list.R:563:3): merging SCEs where 1 altExp is missing works as expected, with altexps ──
Error in `value[[3L]](cond)`: failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
  failed to rbind column 'altExps' across DataFrame objects: failed to
  rbind column 'adt' across DataFrame objects: subscript contains invalid

I've tracked down the specific error back to this part of the combine code -

[tryCatch](https://rdrr.io/r/base/conditions.html)({
        int_cd <- [do.call](https://rdrr.io/r/base/do.call.html)([rbind](https://rdrr.io/r/base/cbind.html), [lapply](https://rdrr.io/r/base/lapply.html)([args](https://rdrr.io/r/base/args.html), [int_colData](https://rdrr.io/bioc/SingleCellExperiment/man/internals.html)))
    }, error=[function](https://rdrr.io/r/base/function.html)(err) {
        [stop](https://rdrr.io/r/base/stop.html)(
            "failed to combine 'int_colData' in 'cbind(<", [class](https://rdrr.io/r/base/class.html)([args](https://rdrr.io/r/base/args.html)[[](https://rdrr.io/r/base/Extract.html)[1]]), ">)':\n  ",
            [conditionMessage](https://rdrr.io/r/base/conditions.html)(err))
    })

But I can't figure out why this error is being thrown. I am able to run rbind() directly on the colData slots, and there aren't any columns here called "adt" or "altExps"... Time for me to take a little break from this code and circle back fresh. Any help spelunking to track down why this is being thrown is appreciated too.

EDIT 😭 make it make sense... (sce_list[[4]] is the one created with NAs)

> do.call(cbind, list(sce_list[[1]], sce_list[[4]]))
class: SingleCellExperiment 
dim: 12 16 
metadata(8): library_id sample_id ... library_metadata sample_metadata
assays(2): counts logcounts
rownames(12): GENE0001 GENE0002 ... GENE0011 GENE0012
rowData names(3): gene_names sce1-other_column sce4-other_column
colnames(16): sce1-TCGACTTTTTTT sce1-GTTATTGAGCCG ... sce4-GTTATGGTCACT sce4-GGGAACCCTCCA
colData names(7): sum detected ... batch cell_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(1): adt
> do.call(cbind, list(sce_list[[1]], sce_list[[2]], sce_list[[4]]))
Error in value[[3L]](cond) : 
  failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
  failed to rbind column 'altExps' across DataFrame objects: failed to rbind column 'adt' across DataFrame objects:
  subscript contains invalid names
jashapiro commented 11 months ago

Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break.

Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right.

sjspielman commented 11 months ago

Obviously we are missing something... But that is why we have the tests! I can look at it tomorrow if you need a break. Since this is one of the main use cases we expect to see in real data (some libraries missing altExps), this is important to get right.

💯 all around. I tried more and the "edit" I added in that comment above blew my mind, so I do need to actually take that break today...!

sjspielman commented 11 months ago

Tests added here! https://github.com/AlexsLemonade/scpcaTools/pull/251/commits/2f12c6cc7f652410d44178bb0ad99d7119821498

sjspielman commented 11 months ago

Noting https://github.com/AlexsLemonade/scpcaTools/pull/251/commits/e4a7399ea4c2d686e13b0fc02e3b8371c008b8ff is passing locally!