Advice on implementing EMPL in a surrogate model

I was hoping you might give me some advice on implementing EMPL in a surrogate model I have. I am working with micromechanical simulations of gas pores produced during processing in Additive Manufacturing, and due to the simulation time I put together a surrogate model that will predict the simulation output given a voxelized 3D pore volume. The target and output are single channel 3D stress fields with each voxel intensity representing a stress value, usually from around -100 to 300 or so. So I am comparing the similarity of the output mask to the target stress field. Using MSE seemed to work great, giving me an R2 of 99.87%, but a closer look revealed some issues. Below is an example of a target field on the left, and the mask on the right. While the network does well in capturing the general field morphology, the distribution is always off. In most every case it overshoots the minimum and undershoots the maximum so that it essentially squishes the distribution together.

I am using the stress field outputs to calculate stress concentration, which is dependent on the maximum stress and the average stress. At the very least I need to be getting the right max stress, but it would be preferable to match the stress distribution as closely as possible. Trying to match histograms is what led me here to EMPL. So my question is, do you think EMPL would help me out here?

If so, for the astrophysical example, I found the code below for an implementation of EMPL. Could you perhaps explain the parameters in more detail?

y_true and y_pred are histograms correct, and what is the expected shape?
For my range of values from -100 to 300, would I need to normalize that in some way, or turn it from 0 to 255, or 0 to 1 or something?
What exactly is tau and how might I set it up (I think your paper mentioned you can just pull from a uniform distribution)? Is it a single value, or how does it relate to the shapes of y_true and y_pred?
For weights, is that something I could use to, for instance, weight the upper quantiles more heavily to ensure I am matching the maximum stress well? Or could I use tau to the same effect somehow?

def empl(y_true, y_pred, tau, weights=None, smoothing=0.0):
    """
    Compute the Earth Mover's Pinball Loss (arXiv:2106.02051).
    :param y_true: label
    :param y_pred: prediction
    :param tau: quantile levels of interest
    :param weights: weight the loss differently for different samples
    :param smoothing: scalar >= 0 that determines smoothing of loss function around 0
    :return Earth Mover's Pinball Loss

    """
    ecdf_true = tf.math.cumsum(y_true, axis=1)
    ecdf_pred = tf.math.cumsum(y_pred, axis=1)
    delta = ecdf_pred - ecdf_true

    # If there is an extra dimension for the channel: tau might need to be expanded
    if len(tau.shape) == 2 and len(delta.shape) == 3:
        tau = tf.expand_dims(tau, 2)

    # Non-smooth C0 loss (default)
    if smoothing == 0.0:
        mask = tf.cast(tf.greater_equal(delta, tf.zeros_like(delta)), tf.float32) - tau
        loss = mask * delta

    # Smooth loss
    else:
        loss = -tau * delta + smoothing * tf.math.softplus(delta / smoothing)

    if weights is None:
        weights = tf.ones_like(ecdf_true)

    if len(weights.shape) < len(y_true.shape):  # if bin-dimension is missing
        weights = tf.expand_dims(weights, 1)

    # avg. the weighted loss over the bins (1) and channel dimension (2)
    return tf.reduce_mean(loss * weights, [1, 2])

Hi Daniel,

Thanks for reaching out - this sounds like an interesting problem.

If I understand correctly, estimating voxelwise stress values (i.e. spatially resolving the stress field) is an essential part of your analysis pipeline, so you would probably only want to use a histogram-based loss $L_{hist}$ in addition to a voxelwise loss function such as the MSE to improve the match between the true and estimated distributions, so something like $L{tot} = MSE + \lambda L{hist}$. This sounds like a reasonable idea to me and is probably worth trying.

The use case for the EMPL seems a bit different though: there, the idea is to estimate a distribution of possible values of the cumulative histogram in each bin, see e.g. https://github.com/FloList/EMPL/blob/master/astrophysics_example.png where light blue shows the true histograms, and the coloured regions correspond to estimated uncertainties for the predicted histograms. In your case, the primary labels are a 3D field and based on your description of the problem, the regression task is deterministic in that your model estimates a single (mean) stress field given an input pore volume. From this stress field, you would then compute a (single) histogram and compare it to its true counterpart. This means that you wouldn't need the "pinball" part of the loss and the associated quantile levels $\tau$ (where the idea is that these values are randomly drawn during the training, so the model learns to express uncertainties on the values of the cumulative histogram in terms of quantiles, and at evaluation time you can query the cumulative histograms for given values such as $\tau = 0.05$, $\tau = 0.5$, and $\tau = 0.95$ to get the mean predictions and the centred 90% confidence interval). In other words, the tau is only used to get multiple predictions in each bin for the purpose of uncertainty quantification, which you probably don't need.

To compare a single deterministic prediction to the truth, you could instead use the "standard" Earth Mover's Distance loss,

def emd_loss(p, p_hat, r=1, scope="emd_loss", do_root=True):
    """Compute the Earth Mover's Distance loss."""
    with tf.name_scope(scope):
        ecdf_p = tf.math.cumsum(p, axis=-1)
        ecdf_p_hat = tf.math.cumsum(p_hat, axis=-1)
        if r == 1:
            emd = tf.reduce_mean(tf.abs(ecdf_p - ecdf_p_hat), axis=-1)
        elif r == 2:
            emd = tf.reduce_mean((ecdf_p - ecdf_p_hat) ** 2, axis=-1)
            if do_root:
                emd = tf.sqrt(emd)
        else:
            emd = tf.reduce_mean(tf.pow(tf.abs(ecdf_p - ecdf_p_hat), r), axis=-1)
            if do_root:
                emd = tf.pow(emd, 1 / r)
        return tf.reduce_mean(emd)

Assuming you bin your value range of $[-100, 300]$ into $N_{bin}$ bins, p (p_hat) would be the true (estimated) density histogram (both with shape $(N{batch} \times N{bin})$ and normalised to sum to 1). The emd_loss function above will then compute the cumulative histograms (using cumsum) and either compute the $\ell^1$-loss (r=1) or the $\ell^2$-loss (r=2, see e.g. https://arxiv.org/abs/1611.05916). The stress values $[-100, 300]$ do not appear explicitly in the loss function (and therefore don't need to be normalised); the cumulative histogram values will be compared bin by bin, and the final loss function is computed as an average over the bins. If you'd like to use different weights for different bins (such that values in the range e.g. $[250, 300]$ influence the loss more), a weighting of the bins could be added within tf.reduce_mean(...).

This is exactly what I was looking for. Thanks for the great response.

FloList / EMPL

Advice on implementing EMPL in a surrogate model #2