another question about running on HPC

sformel commented 3 years ago

I saw the recent conversation in issue #32, but since it was closed I thought it might be a good idea to make a new question.

I'm trying to run ALDEx on an HPC to get effect sizes from an RNA-seq study, but I'm running out of memory even with 10 nodes (dual 10-core 2.8 GHz Intel Xeon E5-2680 v2 CPUs, 64 GB of RAM). My dataset is an isoform dataset, with about 1.6 million features and only three replicates per treatment. I recognize that that's a lot of features and low replicates, but...it is what it is. Based on the advice in the vignette I have chosen to do 1000 Monte-Carlo instances.

I might be asking to do too much here, but I'm not sure which parts of aldex.clr and aldex.effect are run in parallel, so I'm hoping you can point me to the right parameters to adjust. I'm working on a SLURM based system.

Do you have any suggestions about adjusting my process or setting SLURM parameters like --cpus-per-task or --ntasks-per-node? Thanks in advance!

ggloor commented 3 years ago

I would suggest that you try running as single not multicore. This may help the memory issue.

Are both functions running out of memory or only aldex.effect?

in testing, I am finding that multicore is not appreciably faster than single because of the overhead. I'm sure the overhead on your dataset is huge.

I have not tested to see which functions in aldex.effect are using the most memory. Something to dig into

greg

sformel commented 3 years ago

Thanks for getting back to me so quickly! I'll try to run it single core and see what happens. I think it's just the aldex.effect, but I'll check that as well.

I just thought of an additional piece of information that might be helpful. I can run my complete data set with 128 MC samples. I've browsed the ALDEx2 literature, but I couldn't find the data behind the mc.samples thresholds that are recommended in the vignette (forgive me if it's obvious!). How much wiggle room is there on the recommendation of 1000 for effect size?

ggloor commented 3 years ago

it's complicated. The more MC samples, the greater the precision when estimating sampling error. every 4X increase of MC samples gives you about twice the precision. In test datasets, I see stabilization of the results after about 512 MC samples.

This is especially a problem when running small sample sizes. The real issue is when interpreting features at the margin of significance.

One way around this would be to run the analysis 8X and to average the results. you have lots of cores available so this could be done trivially.

sformel commented 3 years ago

Ok, that makes sense. Thanks for taking the time to explain it. I'll give an update later this week on how it works out running single instead of multicore.

sformel commented 3 years ago

To give a little update, it looks like it's the aldex.effect that is running out of memory, the aldex.clr function is running ok. I also tried to run it on a reduced features table (1.1 million genes instead of 1.6 million isoforms) and still ran out of memory on a single 256GB node. It also might be helpful to know that I'm running R and ALDEx2 out of a Conda environment because of the limitations of our HPC.

When I run on the single core, the error message is slightly different. I'm not sure if it gives any insight in to where it's hanging up:

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': cannot allocate vector of size 19.2 Gb Calls: aldex.effect -> t -> apply -> array -> .handleSimpleError -> h

If you happen to have any silver bullets, I will gladly take them :-) but mostly I just wanted to post this so other users could see the parameters I was using. I'm going to keep working on it, if you don't mind keeping this open, I'll report back if I make any progress.

sformel commented 3 years ago

So I think I'm going to use the averaging option as a way forward, thanks again for your help! Here are a few more crude benchmarks in case they help anyone in the future:

To run 1000 MC samples on a single node (see above for description of the hardware):

100,000 features maxed out at 40GB of memory to run in serial and took about 1 hour of wall time
500,000 maxed out at 183 GB and took 2.5 hours of wall time
800,000 maxed out at 271 GB and 9 hours of wall time

ggloor / ALDEx_bioc

another question about running on HPC #34