Open Dario-Rocha opened 1 year ago
Hi Dario, glad it's been working well for you! You can just run as(bpcells_mat, "dgCMatrix")
to convert to an R sparse matrix
I'll add something to the docs so this is a bit more obvious
Thank you for your reply, however I can't manage to do this because
Error: Error opening file: matrix_bp_soup/index_starts
Maybe this is a consequence of the expression matrix in question being stored in a v5 Seurat object that has been processed, saved as RDS and loaded, multiple times. Do you think this is a problem caused by Seurat processing?
To start with troubleshooting, some basic things to check:
matrix_bp_soup/index_starts
? Do you have read permissions to it?It is possible this has to do with Seurat processing, as they are overloading the saveRDS
function for Seurat projects. My rough understanding is that for BPCells objects this can result in relocating the original data files to a new directory and updating the BPCells R objects to point to the new file paths. If for some reason the R object has gotten out of sync with where the files are stored, this could cause errors. (Though it is possible to manually patch up broken paths with an experimental BPCells function -- let me know if you need more info on this)
Could you let me know what the results are of the troubleshooting steps I listed above?
After updating BPcells, the error is now indicating that too many matrices are opened. Indeed, this seurat object was created from a list of 92 BPCells matrices. I have read permissions to the /index_starts of the matrices, even though they are not stored in the working directory of the script. The first line of queued operations isn't really pointing to a location. It may have something to do with with the layers of the seurat object each being a combination of two BPCells matrices, that's why it indicates 46 matrices while in fact there are 92 samples.
Queued Operations:
- Concatenate columns of 46 matrix objects with classes: RenameDims, RenameDims ... RenameDims
- Select rows: 1, 2 ... 36601 and cols: 1, 2 ... 925242
I am running MacOS Ventura 13.3.1
Hi Dario, I see your issue in the Seurat repo -- for the BPCells-specific part of this discussion I think we can keep things here.
Thanks for checking up on those details. I was not aware that Macs also sometimes had a max open files issue, but at least this source claims the default limit is 256, which is too low to handle 92 matrices at once with BPCells right now.
There are two directions you could go for solutions:
rbind
or cbind
functions, then save out to disk. This can combine multiple files into one. E.g. you could combine 92 matrices -> 8 (and optionally further combine down to 1) as a workaround to the open file limits.ulimit -n 1024
should work, and you could add that in your .zshrc
or .bashrc
file so you don't have to type it in every time before running R. This stackexchange answer seems to have some more complicated suggestions for a permanent increase in the maximum file limitIt's a bit tricky for BPCells to decrease the number of open files in these cases, so one of those two workarounds is likely your best option in the near term. Let me know if one of those works for you.
Hello there, although this is not the exact same issue I've decided to reply here because it's basically the same thing, just with a different error I have a seuratv5 object saved as a .qs file. I was tasked to extract and export the counts matrix, so I tried creating a new BPCells matrix object or converting the BPCells to dgCMatrix, in both cases the error is the same:
no slot of name "threads" for this object of class "ColBindMatrices"
temp_lo <- qread('complete_v01_1_rpcacd25lo_seuratv5_anon.qs')
temp_lo <- temp_lo[["RNA"]]
temp_lo <- JoinLayers(temp_lo)
dim(temp_lo)
temp_lo
temp_lo <- temp_lo$counts
write_matrix_dir(mat = temp_lo, dir = 'file_path',
overwrite = TRUE)
temp_lo <- as(temp_lo, "dgCMatrix") Error in iter_function(iterators, x@threads) : no slot of name "threads" for this object of class "ColBindMatrices"
The cause here appears to be that the file you're loading was created on an earlier version of BPCells, before the threads
slot was added to the ColBindMatrices
class. Therefore, once you load the object from disk it looks like it is missing the slot.
In this case, assuming that class(temp_lo$counts)
is ColBindMatrices
, I think it will suffice to run temp_lo$counts@threads <- 1L
. I believe this will print out a warning message but after that things should work okay. If the top-level layer of temp$counts
is not a ColBindMatrices
object, you may need to dig in a couple layers of @matrix
slots (e.g. temp_lo$counts@matrix@matrix@threads <- 1L).
Given that there have been similar issues from a different update in #79, it seems I should look into making a helper function to help update BPCells objects from the old versions to the latest version.
I've been using this package for some weeks now and it's amazing how it allows to process big datasets in R. After doing some work, I need to export a trimmed verison of the dataset in the usual sparse matrix format, for compatibility reasons, but I am failing to find a way to do so, would you be so kind to point me into the right direction?