Closed dbuscombe-usgs closed 1 year ago
function that will take a validation subset, and model predictions on that subset, and create a confusion matrix, then report an array of stats from the confusion matrix:
as well as the following stats not generated (to my knowledge) by the c/m
to be added to the end of the do_train
script
that way, re-generating the stats would only require running do_train
again using do_train=False
in the config file
Lots of good ideas here https://nowosad.github.io/post/motif-bp2/
this link is now dead
This is still important to look at. On multiclass problems, mean IOU is often pegged at 1.0 from the start of model training. This is not correct. Also, multiple metrics would be useful for some downstream model applications
A new issue is that KLD appears always to be infinity for multiclass models in the evaluation step at the end of train_model.py
. @CameronBodine are you seeing the same thing?
Yes, KLD on train and validation subsets is always infinity. I don't recall it being an issue when doing my binary depth model.
KLD = infinity was caused by non-normalized, non-integer model outputs
solution is to compute the argmax
, the one-hot encode
kl = tf.keras.losses.KLDivergence() #initiate object
est_label = np.argmax(est_label.squeeze(),axis=-1) #argmax to flatten()
#one-hot encode
nx,ny = est_label.shape
lstack = np.zeros((nx,ny,NCLASSES))
lstack[:,:,:NCLASSES+1] = (np.arange(NCLASSES) == 1+est_label[...,None]-1).astype(int)
#compute on one-hot encoded integer tensors
kld = kl(tf.expand_dims(tf.squeeze(lbl), 0), lstack).numpy()
I have implemented all of the metrics on this codebase, i.e. generating all stats from the CM
Need to update docs to credit the above
For example, this is a random model output (bad). The mean IoU much more accurately reflects the situation (generally bad), as does mean KLD, which is now fixed
Mean of mean IoUs (validation subset)=1.000
Mean of mean IoUs, confusion matrix (validation subset)=0.130
Mean of mean frequency weighted IoUs, confusion matrix (validation subset)=0.238
Mean of mean Dice scores (validation subset)=0.873
Mean of mean KLD scores (validation subset)=1.329
will appear in train_model.py
as part of the final validation step on 10 batches of validation samples using a modified plotcomp_n_metrics
this function also creates two files of per-image model metrics. example files on a small number of validation samples below (just for illustration)
noaa_spring2022_resunet_model1_model_metrics_per_sample.csv noaa_spring2022_resunet_model1_model_metrics_per_sample_per_class.csv
these mods require a mod to doodleverse_utils.py
that will call all the model metrics in its own py script, useful also for Zoo and other extensibility
started a new metrics
branch with these changes implemented. One could use this branch to use train_model.py
to generate new sets of metrics for already trained models, usign do_train
:false in the config file
Next is keras implementation iof the matthews correlation coefficient. Looks straightforward:
>>> metric = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
>>> metric.update_state(y_true, y_pred)
>>> result = metric.result()
>>> result.numpy()
-0.33333334
Requires pip install tensorflow_addons
in the conda env
.... was not straightforward, but a solution has been obtained
First, this is easily tested and works on arbitrary matrices of zeros and ones
size = (768, 768)
y_true=np.random.randint(0, 1, size=size)
y_pred=np.random.randint(0, 1, size=size)
metric = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=2)
metric.update_state(y_true, y_pred)
metric.result()
however I have not been able to determine how to adapt to multiclass, for example
size = (768, 768)
y_true=np.random.randint(0, 2, size=size)
y_pred=np.random.randint(0, 2, size=size)
metric = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=3)
metric.update_state(y_true, y_pred)
metric.result()
yields
InvalidArgumentError: in user code:
File "/home/marda/anaconda3/envs/gym/lib/python3.10/site-packages/tensorflow_addons/metrics/matthews_correlation_coefficient.py", line 85, in update_state *
new_conf_mtx = tf.math.confusion_matrix(
InvalidArgumentError: `labels` out of bound
Condition x < y did not hold.
First 3 elements of x:
[0 0 0]
First 1 elements of y:
[3]
there seems to be no documentation on what arguments update_state
takes - should the values be a certain dtype or shape? arrays or tensors?
If I one-hot encode
y_true_1h = np.zeros((size[0],size[1],3))
y_true_1h[:,:,:3+1] = (np.arange(3) == 1+y_true[...,None]-1).astype(int)
y_pred_1h = np.zeros((size[0],size[1],3))
y_pred_1h[:,:,:3+1] = (np.arange(3) == 1+y_pred[...,None]-1).astype(int)
metric = tfa.metrics.MatthewsCorrelationCoefficient(num_classes=3)
metric.update_state(y_true_1h, y_pred_1h)
metric.result()
I get the same error that I do not understand. I can't seem to find further examples or docs. further trials yielded nothing more of note
Next, sklearn
(not a current gym dependency) has a similar implementation that might work, e.g.
from sklearn.metrics import matthews_corrcoef
matthews_corrcoef(y_true.flatten(), y_pred.flatten())
However, I found an implementation here that I modified to create my own implementation (without any additional dependencies) consistent with the other metrics
def MatthewsCorrelationCoefficient(confusionMatrix):
t_sum = tf.reduce_sum(confusionMatrix, axis=1)
p_sum = tf.reduce_sum(confusionMatrix, axis=0)
n_correct = tf.linalg.trace(confusionMatrix)
n_samples = tf.reduce_sum(p_sum)
cov_ytyp = n_correct * n_samples - tf.tensordot(t_sum, p_sum, axes=1)
cov_ypyp = n_samples ** 2 - tf.tensordot(p_sum, p_sum, axes=1)
cov_ytyt = n_samples ** 2 - tf.tensordot(t_sum, t_sum, axes=1)
cov_ytyp = tf.cast(cov_ytyp,'float')
cov_ytyt = tf.cast(cov_ytyt,'float')
cov_ypyp = tf.cast(cov_ypyp,'float')
mcc = cov_ytyp / tf.math.sqrt(cov_ytyt * cov_ypyp)
if tf.math.is_nan(mcc ) :
mcc = tf.constant(0, dtype='float')
return mcc.numpy()
which seems to work. An example output of the new metric generating function:
{'OverallAccuracy': 0.6961568196614584,
'Frequency_Weighted_Intersection_over_Union': 0.5368461005780909,
'MeanIntersectionOverUnion': 0.3696620352309895,
'F1Score': array([ nan, 0.67122082, 0.43898921, 0.81816669]),
'Recall': array([0. , 0.62919626, 0.29834483, 0.79084303]),
'Precision': array([0. , 0.71926089, 0.83049958, 0.84744599]),
'MatthewsCorrelationCoefficient': 0.43273914}
some plots of the relationship between metrics with a sample dataset. MCC tracks with mean IOU
New doodleverse_utils
version 0.0.4 is posted that contains new model metrics. For now, this is only required for users of the new metrics
branch of segmentation gym.
pip install doodleverse-utils -U
to upgrade from an existing activated gym repository
further, I have now tested the code using an already-trained model, for multiclass, and binary problems
here is a plot of metrics for a binary (water/no water) model
I believe I can close this issue, but first will do some tests with the new functions. In due course, @CameronBodine it would be helpful if you could checkout the new_metrics
branch, update doodleverse_utils
, and trial on your greyscale multiclass model - ta!
First, I think adopting a similar approach to this, i.e. generating all stats from the CM, and also reporting the CM, makes much more sense
Also, explore Matthew's Correlation coefficient
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6413-7, generalized to multiclass