Open franpcozar opened 3 months ago
Hey there,
Thank you for using cyCombine!
It looks like the kohonen::som
clustering method has a hard time allocating memory to such a big dataset, unfortunately.
I am working on a bit of an overhaul of cyCombine that should improve memory performance significantly. But it is nowhere near complete.
I have made a minor update in the dev
branch you can try to install. This allows you to use the mode
argument from kohonen::som
- in hopes the batch
mode is more memory efficient.
After installing the development version of cyCombine, try running batch_correct
with mode = "batch"
to see if that solves the issue.
Otherwise, you will have to find an alternative clusting method to kohonen::som
. FlowSOM
, forexample, works directly on a flowset, which might be more efficient (it requires converting your dataframe to a matrix/flowFrame first - be mindful of matrix orientation and included markers).
You could normalize your uncorrected set with cyCombine::normalize()
, correct with FlowSOM (or another algorithm), and then run batch_correct
on the unnormalized data with label
set to the clustering labels for each cell.
Better yet, you can then split the data into each cluster and run batch_correct
on these individually, setting label
to the clustering number value. This will significantly improve memory usage, the only bottleneck being the clustering step.
These are the principles behind the future overhaul, but it will take me some time to finish the implementation.
Please let me know if any of the two approaches solves the memory challenge!
Best regards, Søren
Thanks for your answer!
I installed the dev
branch of cyCombine and tried running batch_correct
with mode = "batch"
. However I got the error :
Error in match.args(mode) : could not find function "match.args".
I wondered if you meant to use the function match.arg()
instead of match.args()
. Or maybe I forgot to install some dependencies.
Whoops, I was a bit quick in the implementation. That should be fixed now - thank you for pointing it out :)
Best regards, Søren
I am using R version 4.3.3 on a x86_64-pc-linux-gnu (64-bit) system with 2TB RAM.
My datasets consists on 311 files with a total 55188271 cells and 40 markers.
When I attempt to correct for the batch effect in all cells of my dataset, my R session crashes, specifically when running the function batch_correct(). All the previous steps worked nicely. I followed the pipeline describe in https://biosurf.org/cyCombine_CyTOF_1panel.html#Checking_for_batch_effects
Here is a screenshot of the error:
It seems to be a memory problem, but I don't know why as I have 2TB of RAM.
Also, I conducted a test by downsampling to 10000 cells per file (3110000 total cells), and the pipeline worked perfectly.
I have updated all my packages (including the necessary packages), and also tried to running it in R instead of RStudio.
I am looking forward to your response :)