Closed wgmao closed 1 year ago
One clarification question -- I see in the first line under "Queued Operations" it says it's loading compressed matrix from directory "/the path that hosts the data". This should be an actual path to files on the filesystem, e.g. "/home/wgmao/datafiles/matrix_folder". Have you edited out the actual path for privacy, or is that the object you were given?
The command as.matrix()
normally should work, but if the underlying data folder doesn't exist, then it will error like you've seen.
The way BPCells objects work in R is that the R object stores any queued operations in successive layers (e.g. subsetting/normalization), and rather than store the actual matrix data in R, it just stores the path of files on disk that hold the actual matrix data.
In terms of sharing, there are two easy options, and a third more complicated but flexible option:
open_matrix_dir()
, and re-run any processing from raw data. This is the easiest and most reliable optionsaveRDS()
, then another user can open the object and access the data with readRDS()
, provided that the underlying matrix directory has not moved and is accessible by both userssaveRDS()
on a Seurat object containing BPCells matrices.This third option will also help if you want to discard embedded operations and access the original raw data.
There are of course some exceptions to this rule that the underlying data must be shared in addition to the R object -- for example, BPCells also allows saving matrices fully in-memory via write_matrix_memory()
. But for your case I think the data you need is supposed to be shared via a directory that you don't have.
Thank you for the prompt response! Yes, the actual path includes personal information. I replaced it with this pseudo path. Your suggestions make a lot of sense to me. I have two follow-up questions:
IterableMatrix
object about 200 MB in memory if the actual data is on the disk?all_matrix_inputs
. I tried to modify the path embedded in the queued operations but failed. Could you provide a minimal example on how to use this function to modify the path? Thanks!all_matrix_inputs()
-- notice how it allows assignment via <-
as well as querying (though my example doesn't show querying in this case). For clarity in the example I've shown a case where the underlying data is not identical, though for your case you'd probably want dir2
to be a copy of dir1
all_matrix_inputs
Thank you so much for your detailed response! I have to say you are one of the most responsive developers I have met so far! I really appreciate it.
I am a beginner to use BPCells. Thank you very much for developing and maintaining this awesome package! My collaborator shared a large collection of multiome data as a Seurat object (Assay5), and I would like to extract the RNA count matrix from the object. The count matrix
obj@assays$RNA@layers$counts
comes with the following description.I tried two commands
temp %>% write_matrix_memory(compress = F)
andas.matrix(obj@assays$RNA@layers$counts)
that lead to the same errorMissing directory: /the path that hosts the data
.obj@assays$RNA@layers$counts
is about 153MB in memory which makes me think the matrix is already available in the memory. If so, is there a way to modify the embedded operations and extract the count matrix?