GeoscienceAustralia / anuga_core

AnuGA for the simulation of the shallow water equation
https://anuga.anu.edu.au
Other
192 stars 94 forks source link

sequential_distribute: Subdivide mesh :Distribute issue #218

Open Girishchandra-Yendargaye opened 4 years ago

Girishchandra-Yendargaye commented 4 years ago

Distribution is failing.. sequential_distribute: Subdivide mesh Error! ***Memory allocation failed for TRINODALMETIS: nind.

Why does distribute require additional memory

stoiver commented 4 years ago

@Girishchandra-Yendargaye The initial creation of the distributed domain is memory heavy. A sequential domain is created on processor 0 and then we run the structure through metis to partition the structure (again just on processor 0). This essentially doubles the memory. And then the partitions are communicated to the other processors.

What can be done if you are going to be running a number of simulations with the same underlying domain, is to do the partitioning once on a large memory machine or computational node, and then use that distributed structure for all subsequent simulations.

ggiannako commented 4 years ago

Apologies for the intervention (and for not using the users-list), but I have a follow-up question: After generating and partitioning the mesh into N structures using one processor, how can we save the individual structures and read them later using N processors?

stoiver commented 4 years ago

@Girishchandra-Yendargaye Have a look at the code in the folder anuga/simulation and anuga/parallel/parallel_api.py There is code in there using sequential_distribute to do the creation of the the distribution without necessarily running the evolve.

Girishchandra-Yendargaye commented 4 years ago

@stoiver Thank you sir for reply I am running only pickling part still it is failing...Below is code domain = create_domain_from_file(meshfile_name) print "domain created!" domain.set_quantity('elevation',numeric=5.0, #triangle_elevation use_cache=cache, verbose=True, alpha=0.1,location='vertices') print "quantities Set " domain.set_store(True) print "Domain Store set!" sequential_distribute_dump(domain, 2000, verbose=True) print "process completed!"

Output is

domain created! quantities Set Domain name set! Domain Store set! sequential_distribute: Subdivide mesh Error! ***Memory allocation failed for TRINODALMETIS: nind. Requested size: -1302189028

Can you please suggest something.

Girishchandra-Yendargaye commented 4 years ago

@stoiver One more issue while sequential_distribute 39742 Segmentation fault

Even if memory is available it is failing