broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
297 stars 54 forks source link

Memory-efficient posterior generation #263

Closed sjfleming closed 1 year ago

sjfleming commented 1 year ago

It has become apparent that something during the posterior generation process in v0.3.0 is gobbling up way too much memory, more than previous versions. See #251 #248

Conceptually, in v2:

Conceptually, in v3:

This refactor allows us to do a whole lot more. But it also involves computing and saving the full posterior, which was not attempted in v2. While it is perfectly doable (these posterior h5 files are usually less than 2GB), it seems it needed to be done a bit more carefully.

I think the extension of python lists left around objects in memory (by creating references to them) that I did not intend.

Adopting another strategy: keep a python list of (sparsified info as) torch tensors. Append tensors to the lists each minibatch. Concatenate them once and for all at the end. All these tensors are cloned from the originals, detached, and kept in cpu memory.

Closes #248 Closes #251

jg9zk commented 1 year ago

This branch stopped memory from being used up during the for loop, but my job was killed due to OOM sometime after. I'm running on 140 gb, which should be plenty

sjfleming commented 1 year ago

@jg9zk thanks for reporting. Any chance you could post the last few lines of the log file?

jg9zk commented 1 year ago

cellbender:remove-background: Working on chunk (377/383) cellbender:remove-background: Working on chunk (378/383) cellbender:remove-background: Working on chunk (379/383) cellbender:remove-background: Working on chunk (380/383) cellbender:remove-background: Working on chunk (381/383) cellbender:remove-background: Working on chunk (382/383) cellbender:remove-background: Working on chunk (383/383) Killed

jg9zk commented 1 year ago

OOM seems to occur in either line 549 or 550 of posterior.py in commit 6fd8c23 (noise_offset_dict creation)

sjfleming commented 1 year ago

I was able to reproduce that same behavior @jg9zk

jg9zk commented 1 year ago

I tried commit 7fd0ac and it completed! However, it looks like counts are being added to the count matrix instead of removed, but I'll open a separate issue about that.