check if weighted average from multiple runs are ok

nbassler commented 6 months ago

Due to #718 some concern came if weighted average from multiple jobs with different amount of primaries are averaged correctly.

When doing jobs based on time, the number of primaries will be different. Also some jobs might hang, or user might want to merge results from two different runs with very different stats.

grzanka commented 6 months ago

Lets assume we have 3 runs with different number of particles.

I ran C-12 390 MeV/n beam on water tank with following detect.dat file:

Geometry Mesh            # 2-dimensional Cartesian scoring geometry in the YZ plane
    Name MyMesh_YZ
    X -5.0  5.0    1
    Y -5.0  5.0    1
    Z  16.0  16.1   1

Output
    Filename ex_yzmsh.bdo  # this will output as default binary .bdo format
    Geo MyMesh_YZ
    Quantity Dose
    Quantity DLET
    Quantity TLET
    Quantity Zone       # plot the zone map, useful for viewing the geometry and debugging it.

I ran 3 simulations with:

../../shieldhit/shieldhit --nstat=1 --seedoffset=1 --silent . 1>/dev/null
../../shieldhit/shieldhit --nstat=1 --seedoffset=2 --silent . 1>/dev/null
../../shieldhit/shieldhit --nstat=1000 --seedoffset=3 --silent . 1>/dev/null

Then inspected output files with:

convertmc inspect ex_yzmsh_0001.bdo | grep mean
convertmc inspect ex_yzmsh_0002.bdo | grep mean
convertmc inspect ex_yzmsh_0003.bdo | grep mean

The output was:	seed	no of primaries
1	1	131.193
2	1	11.0727
3	1000	134.308
averaging	1002	92.1912

That was simple arithmetic mean:

>>> (131.193 + 11.0727 + 134.308) / 3
92.19123333333334

The weighted mean is:

>>> (1*131.193 + 1*11.0727 + 1000*134.308) / 1002
134.18190189620756

nbassler commented 6 months ago

please keep in mind

/*
   Flags for page->postproc on how to postprocessing the data in page->data.
   Given
   - the data in page->data as x_j
   - for j instances of this simulation (first instance being j = 0)
   - which was done with I_j number of paritcles.
   - The resulting data will be termed X and has the units given by page->data

   If this is changed, adopt documentation as well in sh_bdo.h and User's guide.
 */
#define SH_POSTPROC_NONE   0   /* X = x_0                                   for GEOMAP type scorers*/
#define SH_POSTPROC_SUM    1   /* X = sum_j x_j                             COUNT, ... */
#define SH_POSTPROC_NORM   2   /* X = (sum_j x_j) / (sum_j I_j)             NORMCOUNT, ... */
#define SH_POSTPROC_AVER   3   /* X = (sum_j x_j * I_j) / (sum_j I_j)       LET, ... */
#define SH_POSTPROC_APPEND 4   /* X = [x_0, x_1, ... x_j]                   MCPL */

nbassler commented 6 months ago

FYI

/**
 * @brief sets the detector-page normalization flag, whether the final result should be
 * divided by the number of primaries or not.
 *
 * @details All double scorers are set to no-normalization.
 *          Additionally a few counters and geomap-type are set to no-normaliztion.
 *          Any other case will be normalized.
 *
 * @param[in] *p - page to check where the normalize flag will be set or unset.
 *                 p->divide must be set before calling this function.
 *
 * @returns 1
 *
 * @author Niels Bassler
 */
static int _setup_normalize(struct page *p) {

    /* double scorers should not be normalized, since
       these alreay have # of particles implict in the denominator */

    if (p->divide) {
        p->postproc = SH_POSTPROC_AVER;
        return 1;
    }

    switch (p->detector_type) {

    /* scorers where just the data from the first job should be used, and no further postprocessing */
    case SH_SDET_NONE:
    case SH_SDET_MATERIAL:
    case SH_SDET_NMEDIUM:
    case SH_SDET_NZONE:
    case SH_SDET_RHO:
        p->postproc = SH_POSTPROC_NONE;
        break;

    /* scorers which are simple counters, where the data should simply be summed across all jobs */
    case SH_SDET_COUNT:
    case SH_SDET_COUNTPOINT:
        p->postproc = SH_POSTPROC_SUM;
        break;

    /* scorers where data should be averaged across all jobs, weighted with their individual primary particle weights */
    case SH_SDET_DLET:
    case SH_SDET_TLET:
    case SH_SDET_DLETG:
    case SH_SDET_TLETG:
    case SH_SDET_DZ2BETA2:
    case SH_SDET_TZ2BETA2:
    case SH_SDET_DQ:
    case SH_SDET_TQ:
    case SH_SDET_DQEFF:
    case SH_SDET_TQEFF:
    case SH_SDET_DZEFF2BETA2:
    case SH_SDET_TZEFF2BETA2:
    case SH_SDET_MOCA_YF:
    case SH_SDET_MOCA_YD:
        p->postproc = SH_POSTPROC_AVER;
        break;

    /* scorers which are sequential data, and all jobs should be concatenated to a long list */
    case SH_SDET_MCPL:
        p->postproc = SH_POSTPROC_APPEND;
        break;

    /* scorers which should be normalized "per primary particle" */
    default:
        p->postproc = SH_POSTPROC_NORM;
        break;
    }
    return 1;
}

nbassler commented 6 months ago

also https://github.com/DataMedSci/pymchelper/issues/395

DataMedSci / pymchelper

check if weighted average from multiple runs are ok #722