Open Girishchandra-Yendargaye opened 4 years ago
@Girishchandra-Yendargaye The initial creation of the distributed domain is memory heavy. A sequential domain is created on processor 0 and then we run the structure through metis to partition the structure (again just on processor 0). This essentially doubles the memory. And then the partitions are communicated to the other processors.
What can be done if you are going to be running a number of simulations with the same underlying domain, is to do the partitioning once on a large memory machine or computational node, and then use that distributed structure for all subsequent simulations.
Apologies for the intervention (and for not using the users-list), but I have a follow-up question: After generating and partitioning the mesh into N structures using one processor, how can we save the individual structures and read them later using N processors?
@Girishchandra-Yendargaye Have a look at the code in the folder anuga/simulation and anuga/parallel/parallel_api.py There is code in there using sequential_distribute to do the creation of the the distribution without necessarily running the evolve.
@stoiver Thank you sir for reply I am running only pickling part still it is failing...Below is code domain = create_domain_from_file(meshfile_name) print "domain created!" domain.set_quantity('elevation',numeric=5.0, #triangle_elevation use_cache=cache, verbose=True, alpha=0.1,location='vertices') print "quantities Set " domain.set_store(True) print "Domain Store set!" sequential_distribute_dump(domain, 2000, verbose=True) print "process completed!"
domain created! quantities Set Domain name set! Domain Store set! sequential_distribute: Subdivide mesh Error! ***Memory allocation failed for TRINODALMETIS: nind. Requested size: -1302189028
Can you please suggest something.
@stoiver One more issue while sequential_distribute 39742 Segmentation fault
Even if memory is available it is failing
Distribution is failing.. sequential_distribute: Subdivide mesh Error! ***Memory allocation failed for TRINODALMETIS: nind.
Why does distribute require additional memory