Large transformation times for databases

aakash10gupta commented 3 years ago

I am trying to transform the axisem-generated databases for faster access for iterative inversion processes, as suggested by the developers. My axisem jobs are on our cluster, and I will need the database on the cluster for rapid access of millions of different sources (hypocenter + moment tensor). I have run both global-scale domains as well as localized (in depth and laterally) domains. See table below showing details of some test run times. For me, the transformation of the database is the bottleneck for our science applications.

The issue I face is that the transformation of the generated database takes many days, whereas the job to generate the database takes hours (depending on # cores). I believe the transformation of the database is a 1 core job and hence slow (and will depend on the machine). Please let me know if you have advice. Partly I also wanted to share these tests with others.

martinvandriel commented 3 years ago

Hi Aakash,

the database transform is essentially transposing a giant matrix and involves no computations, everything is IO limited and it is difficult to speed up. However, you can also use the raw database as output by axisem directly, at the cost of slower access for the seismogram extraction from file. This may be in large parts compensated by the built in buffer in instaseis, that avoids file access all together if the source location has been sampled before and this may well be the case in your application (compare figure 16 in [1], source inversion is not IO bound in that test). Depending on you machine, you can increase cache hit likelihood further by increasing the buffer size using the 'buffer_size_in_mb' option which defaults to only 100MB (https://instaseis.net/instaseis.html#open-db-function).

An experimental alternative additional tweak may be to write the database directly in a different chunking. There are two possible ways to achieve that:

the CHUNK_TIME_TRACES and NETCDF_DUMP_BUFFER options as explained in the inparam_advanced. However, this is not well tested, so please verify the results independently.
using serial netcdf instead of parallel by setting USE_PAR_NETCDF = false in make_axisem.macros. While this is a bit of an historic implementation of non-blocking IO using pthreads that we later replaced with the parallel version, it also leads to different chunking that is beneficial for IO along the time axis.

Hope this helps, Martin

[1] Driel, Martin van, Lion Krischer, Simon Christian Stähler, Kasra Hosseini, and Tarje Nissen-Meyer. 2015. “Instaseis: Instant Global Seismograms Based on a Broadband Waveform Database.” Solid Earth 6 (2): 701–17. https://doi.org/10.5194/se-6-701-2015.

aakash10gupta commented 3 years ago

Hi Martin,

Thank you for your suggestions. I tried some of them.

I have been using the serial netcdf option by default as we could never set up the parallel netcdf option on our cluster.

Turning chunking to ‘true’ does generate a database at comparable time which is quick to access even without transformation. This seems to be the way ahead. I had a few questions though -

Does chunking = ‘true’ generate the matrix directly in the transposed form?
Is this database as fast as a merged database?
Does repacking a database to a merged database again transpose the matrix? If so, can this cause any errors when the initial database is generated using chunking = ‘true’?
Why is chunking set to ‘false’ by default?

martinvandriel commented 3 years ago

Hi Aakash

I have been using the serial netcdf option by default as we could never set up the parallel netcdf option on our cluster.

Ok, this may become a bottleneck if you use very many cores (say more than a few hundred or so), where the round robin IO is not very efficient. That is why we implemented collective IO.

* Does chunking = ‘true’ generate the matrix directly in the transposed form?

I am not sure it is exactly the same as it is generated through different codes and likely different chunking, but probably very similar.

* Is this database as fast as a merged database?

I have not tried, but I assume the merged version reduces number of file accesses and improves access pattern within the file, so I could imagine a small factor (somewhere between 2-5 I would guess).

* Does repacking a database to a merged database again transpose the matrix? If so, can this cause any errors when the initial database is generated using chunking = ‘true’?

This should work fine, but I have not actually tried. You should be able to inspect the file layout using standard netcdf or hdf5 tools like ncdump or h5dump to verify.

* Why is chunking set to ‘false’ by default?

Because it potentially slows down file IO in the parallel run, which is crucial when running on very many cores, while a slower transpose in the serial part is then more acceptable.

cheers, Martin

aakash10gupta commented 3 years ago

Thanks a lot Martin for your comments and suggestions.

geodynamics / axisem

Large transformation times for databases #68