PaNOSC-ViNYL / SimEx

Start-to-end photon experiment simulation platform
https://simex.readthedocs.io/
GNU General Public License v3.0
26 stars 25 forks source link

How is the workflow for in-situ scattering in simulations and the Simex Platform #8

Open bussmann opened 8 years ago

bussmann commented 8 years ago

How will the Simex Platform deal with large-scale in-situ scattering simulations that require a sizeable HPC System? Will the workflow pass over values via Python in-situ or via file I/O?

CFGrote commented 8 years ago

Currently, interfaces are based on file I/O. This is a suggestion and can be complemented by in-memory transfer of data if needed.

CFGrote commented 8 years ago

On the same token, distribution of calculation tasks over processes is currently left to the specialized calculators. Once common patterns are identified, these can be incorporated into the abstract base classes.

bussmann commented 8 years ago

Leaving the distribution of calculation tasks to the calculators somewhat makes the use of in-memory tranfer harder qua design, as the domain decomposition of the Simex Platform will not be necessarily the same as that of the calculators, requiring a many-to-many communication (worst case: one-to-all) between the Simex Platform and the calculator. Be also aware that the choice of file format (HDF5) limits I/O performance (parallel HDF5 gets surprisingly slow pretty quickly!).

Let us assume a photon being represented by a 32 Byte sized date structure (which is tight depending on how much accuracy will finally be needed and what kind of information a photon will carry). Let us further assume we take 10^10 photons for sampling (which is about a hundreth of the expected photon number at XFEL).

Then we will end up with 32 x 10^10 Bytes which is about 300 GByte. Our parallel file system at HZDR can at max read 4 GByte/s when using all nodes of the cluster. This means we will need over a minute using all nodes just to read in the photons. With less nodes we will end up with much less I/O bandwidth and HDF5 is not as good as people think in parallel I/O performance.

We should thus discuss if we require in-memory I/O for the platform, as this would greatly increase bandwidth.

ax3l commented 8 years ago

Currently, interfaces are based on file I/O. This is a suggestion and can be complemented by in-memory transfer of data if needed.

Let me just list a few HDF5 references here so we have it all together:

notes:

This might help for a first in-memory test bed.

CFGrote commented 8 years ago

I'm very much in favour of in-memory data transfer as it will also overcome some limitations in simS2E. however, for the mentioned "demonstration version" of simex_platform i'll go with serial hdf5, it's enough for what i need right now.

thanks for the feedback,

carsten

Carsten Fortmann-Grote Scientist for Scientific Simulations WP-84 (Scientific Instrument SPB)

Phone +49 (0)40 8998-5603 Fax +49 (0)40 8998-1905 Email carsten.grote@xfel.eu Web www.xfel.eu

Mailing address European XFEL GmbH Notkestraße 85 22607 Hamburg Germany

Managing Directors: Prof. Dr. Massimo Altarelli, Dr. Claudia Burger Registered as European X-Ray Free-Electron Laser Facility GmbH at Amtsgericht Hamburg, HRB 111165

On 17.11.2015 11:19, Axel Huebl wrote:

Currently, interfaces are based on file I/O. This is a suggestion
and can be complemented by in-memory transfer of data if needed.

Let me just list a few HDF5 references here so we have it all together:

notes:

  • need to cross check if in-memory ("core" driver) and thread safe work together (probably yes)

— Reply to this email directly or view it on GitHub https://github.com/eucall-software/simex_platform/issues/8#issuecomment-157328292.