MFlowCode / MFC

Exascale simulation of multiphase/physics fluid dynamics
https://mflowcode.github.io
MIT License
132 stars 58 forks source link

Added memory to the Delta Mako file to run batch runs #364

Closed mrodrig6 closed 3 months ago

mrodrig6 commented 4 months ago

Description

Please include a summary of the changes and the related issue(s) if they exist. Please also include relevant motivation and context.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Scope

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Test Configuration:

Checklist

If your code changes any code source files (anything in src/)

To make sure the code is performing as expected on GPU devices, I have:

sbryngelson commented 4 months ago

@mrodrig6 what exactly does this do? (please fill out the PR form). how do you know it works? what doesn't work in the current version? @wilfonba you ran some big jobs on Delta, did you find a problem?

wilfonba commented 4 months ago

I ran 2.25 billion cells in batch mode on 32 nodes (128 A100s) with the current mako file. The only thing I had to change was 4 CPU cores per task. Although I guess I should have shared that i needed to do this with Spencer before now.

sbryngelson commented 4 months ago

I ran 2.25 billion cells in batch mode on 32 nodes (128 A100s) with the current mako file. The only thing I had to change was 4 CPU cores per task. Although I guess I should have shared that i needed to do this with Spencer before now.

So how did you go about changing this?

wilfonba commented 4 months ago

I changed it manually in the autogenerated batch script. Not exactly the most optimal way to do it.

sbryngelson commented 4 months ago

I changed it manually in the autogenerated batch script. Not exactly the most optimal way to do it.

I just meant what is the literal SLURM script line

wilfonba commented 4 months ago

#SBATCH --cpus-per-task=4. I think this could also be fixed in how Mauro is suggested by requesting more memory per core.

sbryngelson commented 4 months ago

I'm still unclear on what the right thing to do here is. Someone will need to explain to me what's going on in this PR and pros/cons (to the extent they exist).

wilfonba commented 3 months ago

Replacing --mem=208G with --mem=0 will always request all memory available on a node. This will cause longer wait times for interactive jobs using less than 4 GPUs because the scheduler will wait until a node with all of its memory is available. It might also charge for all 4 GPUs. I'm not sure how Delta handles this case.

sbryngelson commented 3 months ago

I don't see why we are specifying memory at all. SLURM should handle this on its own if you specify all other requested resources (CPU cores and GPU devices). I would be quite shocked if this was not the case.

wilfonba commented 3 months ago

Based on @haochey slack message, DELTA's slurm setup defaults to 1G per core. That fact that I had to specify 4 CPU cores per GPU for the airfoil simulation to avoid out of memory errors suggests that DELTA is doing something weird in regards to memory per core. However 4 CPU cores per GPU didn't map to 1G per core for that case, so I'm not entirely sure what the case is.

sbryngelson commented 3 months ago

can we specify memory per core, then?

wilfonba commented 3 months ago

--mem-per-cpu specifies memory per CPU core in SLURM. It's still odd that it's a problem at all though. This isn't a problem on any other machines. I'm guessing it has to do with UIUC setting a default that encourages full utilization of shared nodes. Ie if someone has an AI workload with a lot of CPU to GPU memory transfer (more CPU memory needed than GPU memory), the memory per core requirement would be higher than if someone had a workload that rarely required CPU memory transfer. This seems like an edge case, unless Delta goes all the way to putting multiple jobs on the same GPU if they don't request the full memory. I'm not entirely sure.

sbryngelson commented 3 months ago

I'm going to merge this since it seems necessary to run any meaningful jobs and I don't know what the alternative is.