Closed patrick-llgc closed 4 years ago
regression_var
estimates the diagonal elements of the covariance matrix. regression_covar
provides the full covariance matrix without the practitionar worrying about the off-diagonal elements. This is performed through the new loss presented in the new version of the paper.
In practice, the regression is performed through estimating the LDL decomposition of the covariance matrix. To do that, I estimate a lower triangular matrix L and a diagonal matrix D. The final form with fro norm is a mathematical approximation of the required loss function to learn a full covariance matrix. I am still working on a stable version without a fro norm, where the full covariance can be learned without that approximation.
The negative penalization regression_covar_balanced
did not transfer well from TF 1.x to TF 2.0 in terms of results, which to be honest I am still exploring. I have the implementation if you are interested, and can post the loss here if you like.
@asharakeh thanks! Could you post the regression_covar_balanced
loss here? I think others may be interested in it as well. I will take a look at the v2 of your paper and report back if I have any questions. Looks like quite a lot of improvements iterated on v1! :)
@patrick-llgc here is the loss. You just need to extend the if statement and it should work as intended.
elif loss == 'regression_var_balanced':
loss_weight = self.loss_weights[self.loss_names.index(loss)]
# Get predictions
predicted_boxes = box_utils.box_from_anchor_and_target_bnms(
anchors, predicted_anchor_boxes)
# Get Ground truth
target_boxes = box_utils.box_from_anchor_and_target_bnms(
anchors, target_anchor_boxes)
# Get estimated inverse of the cholskey decomposition
anchorwise_covar_predictions = prediction_dict[
constants.ANCHORS_COVAR_PREDICTIONS_KEY]
log_D = tf.linalg.diag_part(
anchorwise_covar_predictions)
# Compute Loss
element_wise_reg_loss = self.huber_loss(
target_boxes, predicted_boxes)
covar_compute_loss_positives = tf.reduce_sum(
tf.exp(-log_D) * element_wise_reg_loss, axis=2)
covar_reg_loss = 0.5 * tf.reduce_sum(log_D, axis=2)
covar_final_loss_positives = tf.reduce_sum(
(covar_compute_loss_positives + covar_reg_loss) * anchor_positive_mask) / tf.maximum(1.0, num_positives)
covar_compute_loss_negatives = tf.reduce_sum(tf.reduce_sum(tf.exp(-log_D), axis=2) * anchor_negative_mask) / tf.reduce_sum(anchor_negative_mask)
# Normalize Loss By Number Of Anchors and multiply by loss
# weight
normalized_reg_loss = loss_weight * \
(covar_final_loss_positives + covar_compute_loss_negatives)
# Update dictionary for summaries
regression_loss = tf.reduce_sum(
covar_final_loss_positives * anchor_positive_mask) / tf.maximum(1.0, num_positives)
regularization_loss = tf.reduce_sum(
covar_reg_loss * anchor_positive_mask) / tf.maximum(1.0, num_positives)
loss_dict.update(
{constants.REG_LOSS_KEY: regression_loss})
loss_dict.update(
{constants.COV_LOSS_KEY: regularization_loss})
total_loss = total_loss + normalized_reg_loss
`
Hey @asharakeh , thanks for open-sourcing the code! I was wondering if you could point me to the resource/reference for the LDL decomposition and the frobenius norm approximations for the covariance that you used. Thanks Gunshi
@gunshi that was my own derivation and approximation, and one of the contributions. The full formulation did not fit into the paper so it wasn't included. I have a short supplementary pdf outlining the derivation if you are interested, send me an email and i can forward that to you.
Hi @asharakeh , thanks for releasing your code! I was just reading your paper during the weekend. Congratulations on the great work.
I have a question regarding the difference in
regression_var
andregression_covar
. Inregression_covar
, You seem to replace the diagonal elements of covar prediction with 1, then use the l2 norm of the updated covar matrix to normalize the first term of the uncertainty aware l2 loss (eq 3 in the original paper).https://github.com/asharakeh/bayes-od-rc/blob/master/src/retina_net/models/retinanet_model.py#L292
I do not quite understand the difference between the two methods -- it would be great if you could shed some insights onto this matter.
Also, the penalization of negative anchors in eq 8 of the original paper seems not implemented yet? Is this the (unimplemented)
regression_covar_balanced
about? Thanks for your help!