Closed JLemvig closed 1 year ago
Hi Jonas,
Glad to have you trying out the project! BPCells supports merging fragment objects on-the-fly using the c()
function, which you can use to merge multiple samples into a single object.
In general, my recommended import path is to start with 10x-compatible fragment files, then use open_fragments_10x()
and write_fragments_dir()
to import sample-by-sample (which you can do in parallel with parallel::mclapply). Then, use do.call(c, fragment_object_list)
, optionally followed by another write_fragments_dir()
. It's a bit faster to read directly from a single file rather than merging multiple samples on-the-fly, which is why writing the merged fragments object to disk is often worthwhile. Either way all the same downstream operations are supported.
You can also use GRanges or data frame objects to construct fragments on a sample-by-sample basis, substituting convert_to_fragments()
instead of open_fragments_10x()
. Eventually, BPCells will likely have direct import support from ArchR projects, but that's a few months out from being written (unless you're interested in making a contribution sooner)
I did recently do a merge on a 500k cell dataset where the memory usage spiked at ~18GB. I consider this a bug and in general the memory usage should stay more in the 2-3GB range, but just letting you know in case it occurs on your end. Doing the merge in a few separate rounds of merge + save might circumvent this issue.
-Ben
Closing this for now, but feel free to reply/re-open the issue if you have further questions in this area
Hi @bnprks,
I am excited to try BPCells on my current ATACseq project but I am having issues migrating my data from ArchR.
My current project contains 500.000 cells across 100 samples, and while extracting the fragments of a single sample and converting them to BPCells fragments works as expected, I am running into trouble when I try to scale it to the full project exceeding the R vector limitations combining the samples.
I can see from the Seurat v5 tutorial on BPCells that they are creating individual matrices and merging them when creating the Seurat object, but i am uncertain if this would work with ATACseq.
What would be the best approach to converting my ArchR project to BPCells?
Best regards Jonas