DenseBlockClassifierFromX0s computes a summary statistic to quantify the presence of a dense block of material from PoCA X0 predictions by comparing the mean X0 of the m lowest X0 predictions (L) against the mean of the n-m highest predictions (H), where m corresponds to the expected number of voxels in the dense material: r = 2(H-L)/(H+L)
For compatibility with binary cross-entropy (BCE), this r must be between 0 and 1. Currently this is achieved by passing r through a sigmoid function, after an optional rescaling via a user-provided offset (o) and coefficient (c): r <- (r+o)c.
Problem
The value of the BCE is sensitive to the position and scale of the summary statistic s = sigmoid(r), however the ROC AUC metric is not. Below are values of r simulated by drawing samples from a pair of Guassians, which are well separated. Prior to computing s, the values under go transformations which shift or rescale their values:
In both cases, the ROC AUC (separation between the two classes) remains the same, however the value of the BCE can change. This means that minimising the BCE doesn't necessarily lead to better classification power, e.g.:
Although the "optimised" detector produces a much worse ROC AUC, it has a smaller loss than the initial configuration simply due to its summary statistic being in a more central location.
Potential ideas
Ensuring that (for an equal probability of class 0 or 1) the distribution of s is centred on 0.5 should then allow the optimisation process to purely focus on separating the prediction. However since the loss of a batch of volumes is computed on a volume-wise basis, DenseBlockClassifierFromX0s cannot compute e.g. the mean of the current PDF for r. Additionally, we want to avoid the user needed to tune offsets and coefficients manually.
An idea could be to either learn the offset as a parameter to be optimised, or track a running average of the past values for r (pre-offset), and use this to set offset value used. Either a moving window, or exponential decay could be used to keep the offset up to date with the detector configuration.
State
DenseBlockClassifierFromX0s
computes a summary statistic to quantify the presence of a dense block of material from PoCA X0 predictions by comparing the mean X0 of the m lowest X0 predictions (L) against the mean of the n-m highest predictions (H), where m corresponds to the expected number of voxels in the dense material: r = 2(H-L)/(H+L) For compatibility with binary cross-entropy (BCE), this r must be between 0 and 1. Currently this is achieved by passing r through a sigmoid function, after an optional rescaling via a user-provided offset (o) and coefficient (c): r <- (r+o)c.Problem
The value of the BCE is sensitive to the position and scale of the summary statistic s = sigmoid(r), however the ROC AUC metric is not. Below are values of r simulated by drawing samples from a pair of Guassians, which are well separated. Prior to computing s, the values under go transformations which shift or rescale their values:
In both cases, the ROC AUC (separation between the two classes) remains the same, however the value of the BCE can change. This means that minimising the BCE doesn't necessarily lead to better classification power, e.g.: Although the "optimised" detector produces a much worse ROC AUC, it has a smaller loss than the initial configuration simply due to its summary statistic being in a more central location.
Potential ideas
Ensuring that (for an equal probability of class 0 or 1) the distribution of s is centred on 0.5 should then allow the optimisation process to purely focus on separating the prediction. However since the loss of a batch of volumes is computed on a volume-wise basis,
DenseBlockClassifierFromX0s
cannot compute e.g. the mean of the current PDF for r. Additionally, we want to avoid the user needed to tune offsets and coefficients manually.An idea could be to either learn the offset as a parameter to be optimised, or track a running average of the past values for r (pre-offset), and use this to set offset value used. Either a moving window, or exponential decay could be used to keep the offset up to date with the detector configuration.