geodynamics / axisem

AxiSEM is a parallel spectral-element method to solve 3D wave propagation in a sphere with axisymmetric or spherically symmetric visco-elastic, acoustic, anisotropic structures.
66 stars 31 forks source link

parallelize field_transform #38

Open martinvandriel opened 9 years ago

martinvandriel commented 9 years ago

Once going to very large databases (~10TB), the field_transform becomes a serious bottleneck. While computation can be done in a few hours, serial field transform takes multiple days to weeks (serial read/write rates on parallel file systems are really bad, about 30 MB/s on CSCS machines).

Workload should be limited, because this refers to a single loop with very few lines

https://github.com/geodynamics/axisem/blob/master/SOLVER/UTILS/field_transform.F90#L851-L909

martinvandriel commented 9 years ago

Parallelization might not work together with compression.

martinvandriel commented 9 years ago

https://www.hdfgroup.org/hdf5-quest.html#p5comp

sstaehler commented 9 years ago

Right, that was one of the reasons not to try parallel NetCDF some years ago. Switching the compression off is probably not a problem for kernel applications, where the wave fields are never moved, but for all applications, where databases are sent to IRIS or whoever, this is a problem.

martinvandriel commented 9 years ago

The only way I can think of to avoid this: use the old round robin IO and write the correct chunking readily compressed in the SOLVER :/

sstaehler commented 9 years ago

but didn't we test that writing the correct chunking in the SOLVER is a bazillion times slower?

martinvandriel commented 9 years ago

yes, but there might be room to optimize it. Only include those processors, that actually have to write stuff, threading, reduce number of dumps by buffering as many steps as possible in memory.

I am waiting since a week for field_tranform on a 10TB database, which I computed in a few hours, and it's only 30% done.

sstaehler commented 9 years ago

Well, that is annoying. When trying to increase the dump buffer, keep in mind the low memory on most HPC machines. But I'm curious...

martinvandriel commented 9 years ago

I guess we would need to control, which part of the mesh goes where on the cluster: if each node only has one processor that has crust, it might fit larger time chunks.

martinvandriel commented 9 years ago

So here we go: system maintenance and field transform was killed. We should at least have a restart capability. This should be really easy to implement.