bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
167 stars 17 forks source link

write_matrix_dir only writes one layer #28

Closed aCompanionUnobtrusive closed 1 year ago

aCompanionUnobtrusive commented 1 year ago

Hello, thanks for making such a useful package!

I am having an issue with writing a merged v5 Seurat object to disk...

Using the following code:

write_matrix_dir(mat = seur1[["RNA"]]$counts, dir = countsdir)

gives me the error:

In LayerData.Assay5(object = x, layer = i) :
multiple layers are identified by countssampleID01 counts.sampleID02 counts.sampleID03 counts.sampleID04 counts.sampleID05 
only the first layer is used

And then only the first layer is written to disk, but I want all of the layers written... Do you know how I should do this?

bnprks commented 1 year ago

Hi, happy to help where I can. It seems like a large part of this is a Seurat question, particularly as the error message you've given comes from Seurat, not BPCells. I won't particularly be able to help you with that error message.

My guess is the error is coming from when you're running seur1[["RNA"]]$counts, prior to actually running the BPCells function write_matrix_dir, as BPCells does not have any knowledge of LayerData.Assay5 etc. -- those are Seurat functions + types and will never appear in a BPCells error message.

If you have objects of type "IterableMatrix", those come from BPCells and I can provide direct feedback on what operations are/aren't supported. When printing out one of these objects, you should see information about the matrix printed, starting with a line that looks like 3 x 4 IterableMatrix object with class ....

Now for a long explanation I wrote that in retrospect is probably overkill for what you need to know: 🙃

Long explanation (collapsed to not take up too much scroll space, but actually containing useful information I hope) For the broader question of how to handle matrices with multiple layers, I can tell you more about how BPCells works generally which might help provide some intuition about how to approach the problem overall. 1. I've written a few short articles on the BPCells documentation site, in particular I might recommend the [How BPCells works](https://bnprks.github.io/BPCells/articles/web-only/how-it-works.html) and [Programming philosophy](https://bnprks.github.io/BPCells/articles/web-only/programming-philosophy.html) for more details on the points I've made below 2. Generally, BPCells only uses one permanent copy of the data, which is raw counts saved on disk. - If you have multiple samples, BPCells can concatenate the input matrices on the fly - If you have multiple layers of normalized data, BPCells can re-calculate all of them on-the-fly and have them all draw from the same underlying raw counts files on disk - While it is possible to save a normalized matrix to disk with BPCells, this is often not advisable as it takes extra disk space (sometimes a lot extra), and is usually not much faster than re-calculating the normalized values on-the-fly from the raw counts - There are occasional exceptions where BPCells will write a matrix that's normalized and not not raw counts, but this is usually a temporary copy made for performance reasons and you generally don't need to worry about it as an end user 3. Given that BPCells re-calculates everything on the fly, your question still stands -- how can you save the necessary normalization information so that you can load everything again later? - The easy option: - Save your raw counts matrices to disk using BPCells - Perform any normalizations, etc on the disk backed BPCells objects - Use `saveRDS` to save the normalized matrix - In a later R session, you can use `readRDS` to load the normalized matrices back - I'm not familiar with all the internals of Seurat, but I would suspect that if you start by passing in BPCells objects, then `saveRDS` and `readRDS` will work as desired - The catch: - All this only works if you are saving + loading objects on the same filesystem, and you don't move/delete the original BPCells raw counts matrices - Allowing portable loading/saving of normalized matrix objects is on the roadmap for BPCells, but it's probably at least a few months out - This means if you want to copy a project for someone else, it'll probably require re-running your analysis from raw counts, as there's not a great way to transfer the normalized matrix objects if the raw counts they're loading from also move.
aCompanionUnobtrusive commented 1 year ago

Thank you for your fast and detailed reply!

Your long explanation was super useful, and thanks for the tip to save just the raw counts.

What worked for me was to run JoinLayers on my seurat object first, and then it wrote the counts without issue :)