MCPL files and MPI - Githubissues

huegle commented 5 years ago

When running an instrument that uses an MCPL file as a source with MPI, McStas simply runs the exact same MCPL input file in its entirety on each core and adds the results together. This poses several problems:

any statistics/error bars are biased/wrong
a large number of processes trying to read the same input file at the same time creates a bottleneck, which means the simulation might take longer (dependent on size of mcpl file and number of processors).

/////////////////////////////////////////////////////////////

This bug report is based on the following exchange on the mcstas-users list:

Dear Thomas, The McStas use of MCPL input in MPI settings is implemented in a relatively trivial way: Every particle in the file is processed by every MPI process I therefore suspect that we could be looking at an I/O limitation from the filesystem - 1 vs. 10 vs. 64 processes wanting to read the same 24 Gb each from disk ‘in parallel’.

And I must admit we never benchmarked this part of the implementation wrt. performance… Thanks for letting us know - and let’s perhaps define a GItHub ticket on the topic where we can continue discussing / trying things out? Cheers, Peter

On 7 Nov 2019, at 17.34, Huegle, Thomas hueglet@ornl.gov wrote:

Dear all, I am trying out using MCPL files as input into simulations right now:

COMPONENT sourceMCPL = MCPL_input( filename="/data/source.mcpl" ) AT (0, 0, 0) RELATIVE Origin

When I run the simulation using simply “mcrun mcpltester.instr”, it takes about half an hour (the MCPL file in question is 24GB). However, when I try to use mpi (“mcrun -c --mpi=10 mcpltester.instr”), it takes something closer to 50 minutes. The problem seems to get worse (predicted run time for mpi=64: ~17 hours). Is there a way around it? Some special way of compiling perhaps? Thank you very much! Thomas

willend commented 5 years ago

Hi @huegle,

You write that

When running an instrument that uses an MCPL file as a source with MPI, McStas simply runs the > exact same MCPL input file in its entirety on each core and adds the results together. This poses > several problems:

any statistics/error bars are biased/wrong

I would argue that that saying the statistics/errorbars are biased/wrong is too strong. :) For multiple reasons:

The overall intensity (weight) is preserved, when duplicating of the file. (Multiplication of weight renormalised by MPI node count)
The MC-choice input parameters of E_smear, pos_smear and dir_smear of MCPL_input are taken into account on every file access other the very first by node 0
Repeating the same event multiple times will lead to one of two possible outcomes a) No Monte Carlo after read - trajectories are the exact same, i.e. waste of CPU b) There is Monte Carlo, after which the trajectories will differ

I think we will never implement a method where the events are transferred between nodes, but an option would be that per-node MCPL_input could optionally read independent files (e.g. mpi_multiple_mcpl_files=1, with corresponding file naming, input_0.mcpl.gz, ... input_n.mcpl.gz). There will of course still be a possible dependence on OS / filesystem performance in this case, but it will be less than the current situation.

willend commented 5 years ago

The input_0.mcpl.gz, ... input_n.mcpl.gz naming above will btw correspond well with using the options keep_mpi_unmerged=1 and merge_mpi=0 of MCPL_output

McStasMcXtrace / McCode

MCPL files and MPI #882