GeodynamicWorldBuilder / WorldBuilder

World Builder: An initial conditions generator for geodynamic modeling
GNU Lesser General Public License v2.1
21 stars 28 forks source link

Add the ability to load in files for use in a way compatible with and without mpi programs #211

Open MFraters opened 4 years ago

MFraters commented 4 years ago

Using ascii files or tomograhy data files can be useful for setting up models. It can also be useful for adding topography data for example (#132).

Loading files and reading them is in itself not hard. One of the problems that, if naively implemented, MPI programs using the world builder will 1. load the file from disk on every MPI progress and 2. store one instance of the data in memory per mpi program. This can become problematic for very large/many files.

Issue 1 can be prevented by adding the option for the world builder to compile with MPI and get the MPI communicator, load the file in on processor 1 and than distribute it through the rest of the mpi processes. This is the most important issue since it would prevent a lot of io use. @gassmoeller adviced to look at the utlities function read_and_distribute_file_content in aspect (https://github.com/geodynamics/aspect/blob/master/source/utilities.cc#L959). The MPI communicator can be passed, but it should also be possible to get a default MPI communicator can be directly asked for since it is a global static variable. This will have to be investigated further.

I am not sure whether issue 2 can be solved with MPI since has a distributed memory models and not not shared memory model, but maybe there is a trick. But theoretically it should be possible to load in data once per node and spread a pointer. But I am not sure that would be worth the effort.

In summary, I think it should be restively straightforward to implement it in a way that prevents issue 1, while issue 2 might be a lot harder.

Todo list:

gassmoeller commented 4 years ago

But theoretically it should be possible to load in data once per node and spread a pointer. But I am not sure that would be worth the effort.

This goes into mixed-parallel-shared memory parallelization and usually only becomes an issue with very large datasets, I would advise against going there unless you have a specific need for it. Another option is to look into data partitioning (only store what is needed on the current process), but that has its own set of problems (how do you know which part of the domain is needed on this process?). Most tomography models are small enough to simply store on every process.

MFraters commented 4 years ago

Thanks for the comment, that makes sense.