Closed Nkehoe-QUB closed 3 months ago
Maybe not enough memory for MPI. Make sure that your diagnostic size is reasonable
When I run on a single node (128 cores, 128GB RAM) it will start okay but if I try try 2 nodes (256 cores, 256GB RAM) it quits at this step:
Running diags at time t = 0
-------------------------------------------------------------------------------
I don't think it's a memory issue as its only initialising the diagnostic.
It works on 1 node because then MPI communication is not needed. It fails for communication likely due to limited memory for MPI.
The diag is initializing and writing. There is definitely a communication
Okay, I'll try increasing the amount of memory and see if that helps.
Okay, you were correct. I increased the memory and it seems to be running okay now! I appreciate the help!
Description
I'm trying to add a screen diagnostic in to my simulation but I get the following error output: TempDist.txt
It seems to be when I try to use multiple nodes, it fails. Is this the issue and is there a work around? It's not viable to run the simulation on a single node.
Steps to reproduce the problem
If relevant, provide a step-by-step guide
Parameters
g++ --version
)mpic++ --version
,mpic++ -show
)h5cc --version
)python --version
)