SPECFEM / specfem3d

SPECFEM3D_Cartesian simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra (structured or not).
GNU General Public License v3.0
406 stars 226 forks source link

merge all the mesh data files into a single mesh file per MPI slice #20

Closed mpbl closed 8 years ago

mpbl commented 10 years ago

merge all the mesh data files into a single mesh file per MPI slice; on large machines many people (including Dominik Goeddeke and I, also people at the UK supercomputing center) have found that generating more than 4O smaller files per slice in GLOBE puts a very heavy load on LUSTRE or GPFS shared file systems for no reason

the only reason why we have many files per slice is historical; we should definitely change that

Add ADIOS support for that

I have asked Jo to reduce that to a single file per process

we will need to do two things:

This should be done in both SPECFEM3D and SPECFEM3D_GLOBE

this is critical, because currently in GLOBE we write about 42 files per MPI slice, thus when running on 1000 cores we create 42,000 files for no reason. Let us reduce this to 1000.

IMPORTANT: see if this has an impact on processing scripts, if we decide to also merge the files output by the solver (for instance the several different types of sensitivity kernel files) into a single file. If so, some kernel processing scripts and tools will need to be adapted accordingly.

Feedback from Qinya: I can't agree more. We have had experience running big specfem jobs on our National Supercomputer Consortium (related to kernels for noise measurements, which generate even greater number of files). It crippled the entire file system, and the system administrators became really unfriendly to us after that ...

On the other hand, I know a lot of visualization programs (paraview) actually read the individual binary files and combine them to form bigger visualization domains. So if we rewrite this, we need to write it in a form so that it is easy/faster for an external program to access bits of it (such as x,y,z, ibool, etc). Maybe direct-accessed C binary files?

(todo_list_please_dont_remove.txt: suggestion 11)

mpbl commented 10 years ago

ADIOS has been implemented for:

The question is then to know if we want this feature to be available without ADIOS support.

komatits commented 10 years ago

Having that option also without ADIOS would help, unless we permanently switch to ADIOS only (in which case we should get rid of the non-ADIOS source code, also remove the ADIOS flags from the Par_file and from the code, and provide the source code of ADIOS as a sub-directory in our source code. I am not sure which option is best.

mpbl commented 10 years ago

@komatits It depends on what your view on specfem3d* are. Lightweight, highly parallel, or both. I guess we are favoring the last solution as we are allowing specfem3d to be compiled without mpi.

ADIOS added values are for large runs -- but it still works well for small scale problems.

There is the usual problem of maintaining two version of a routine. I remember that we had troubles with some variables renamed in the specfem3d_par module and were not modified accordingly in the ADIOS routines.

I am not a big fan of providing the source code of ADIOS as a sub-directory in our source code. I would rather indicate that ADIOS installation is a prerequisite for specfem compilation.

@pnorbert Do you have any suggestion on what would be the best solution?

komatits commented 10 years ago

@mpbl @pnorbert

Thanks!

Regarding "lightweight, highly parallel, or both", I would say both for sure (because of the large user base). If there is a way of using ADIOS in lightweight mode (and probably include the source code in the distribution, so that users do not need to download packages from different sites), and if we are sure that ADIOS runs fine on all platforms, then switching to ADIOS only could be an option. Otherwise as you say I think we will need to go with two different versions of each I/O routine.

In a way, now that we have a large set of benchmarks, if we include at least one that uses ADIOS and benchmarks it then it is easy to check (with buildbot) that both versions keep working fine.

On 11/04/2014 17:31, Matthieu Lefebvre wrote:

@komatits https://github.com/komatits It depends on what your view on specfem3d* are. Lightweight, highly parallel, or both. I guess we are favoring the last solution as we are allowing specfem3d to be compiled without mpi.

ADIOS added values are for large runs -- but it still works well for small scale problems.

There is the usual problem of maintaining two version of a routine. I remember that we had troubles with some variables renamed in the |specfem3d_par| module and were not modified accordingly in the ADIOS routines.

I am not a big fan of providing the source code of ADIOS as a sub-directory in our source code. I would rather indicate that ADIOS installation is a prerequisite for specfem compilation.

@pnorbert https://github.com/pnorbert Do you have any suggestion on what would be the best solution?

— Reply to this email directly or view it on GitHub https://github.com/geodynamics/specfem3d/issues/20#issuecomment-40216816.

Dimitri Komatitsch CNRS Research Director (DR CNRS), Laboratory of Mechanics and Acoustics, UPR 7051, Marseille, France http://komatitsch.free.fr

QuLogic commented 10 years ago

with buildbot

Is it still running? I am unable to access it currently.