JirongZhang / DeepHomography

Content-Aware Unsupervised Deep Homography Estimation
MIT License
335 stars 57 forks source link

The effect of using Triplet loss #20

Open dkoguciuk opened 4 years ago

dkoguciuk commented 4 years ago

Hi @JirongZhang ,

those two statements seem to be mutually exclusive. Could you please elaborate on that? Is using the triplet loss beneficial or not?

Readme:

The "Oneline" model can produce almost comparable performance as "Doubleline" model, but much easier to optimize.

Paper:

Triplet loss. We further exam the effectiveness of our triplet loss by removing the term of Eq. 5 from Eq. 6. As shown in Table 2(c) “w/o. triplet loss”, the triplet loss decreases errors over 50%, especially beneficial in LT (118.42%lower error) and LL (70.10% lower error) scenes, demonstrating that it not only avoids the problem of obtaining trivial solutions, but also facilitates a better optimization.

JirongZhang commented 4 years ago

Your understanding is inaccurate. Both Oneline and Doubleline use triplet loss to optimize network. The difference is that Oneline only predicts Hab, while Doubleline predicts Hba at the same time. "removing the term of Eq. 5 from Eq. 6." means just minimize the distance between f(Ia_warp) and f(Ib). However, it is also important to maximize the distance between f(Ia) and f(Ib). @dkoguciuk

dkoguciuk commented 4 years ago

Hi @JirongZhang ,

sure, of course, you're right, my bad. Did you put the oneline model in the ablation study in your paper?

Best, D

JirongZhang commented 4 years ago

Sorry Daniel, this is our follow-up experiment. But indeed it should appear in ablation study. @dkoguciuk

dkoguciuk commented 4 years ago

Hi @JirongZhang ,

I am still having a hard time understanding your oneline loss. In the readme you've got the loss as follows:

loss

The first term is defined as follows (from the paper):

Screenshot from 2020-08-20 17-12-44

And the second one is defined as follows (from the paper):

Screenshot from 2020-08-20 17-13-38

But in the code it's defined differently:

        # Calc triplet loss
        feature_loss_mat = triplet_loss(patch_2, pred_I2_CnnFeature, patch_1)

        # Apply masking
        feature_loss = torch.sum(torch.mul(feature_loss_mat, mask_ap)) / sum_value
        feature_loss = torch.unsqueeze(feature_loss, 0)

The paper suggests we should first mask the feature difference of (pred_I2_CnnFeature - patch_2), and then use it with triplet loss, but the code does it the other way around: first, calculate per pixel triplet loss, then mask the output. This is a crucial difference (at least from my understanding), because of the max(-, 0) operation in triplet loss. I think the code version could be written like:

CodeCogsEqn

And, I've implemented (in my own repo) both versions, from your paper and the one from your code: the former one does not work, while the latter one works like a charm! So, where's an error with my understanding of your loss presented in the paper?

dkoguciuk commented 4 years ago

Oh, I am sorry, I've made an error in the last eq, it's not Ln(I'a, Ib) it's the final loss to be minimized.

dkoguciuk commented 4 years ago

@JirongZhang , what are your thoughts on that?

JirongZhang commented 4 years ago

Hi @dkoguciuk Thanks for your correction. I did have some inaccurate descriptions when writing the formula. And this will also cause confusion to other readers. In fact, the formula you wrote is a more accurate description of Code from Oneline. In order to avoid the misunderstanding of other readers, I am going to add your corrections on the page. In addition, sentence such as "works like a charm!" is still encouraging.

Best, Jirong

dkoguciuk commented 4 years ago

Hi @JirongZhang ,

Is the doubleline model formula from the paper correct? Should it be:

Screenshot from 2020-08-27 13-54-29

Or similar to defined above oneline version:

CodeCogsEqn

In readme, you've got: "please add another half of the loss" which suggests using second option.

I was probably a bit overoptimistic saying "works like a charm" - oneline is still hard to optimize, which means I can converge in one of several (3/4) learning sessions and learned models differ in terms of registration quality. I believe this is the cost of optimizing in unbounded feature space.

Best, Daniel

dkoguciuk commented 4 years ago

@JirongZhang , what are your thoughts on that?

dkoguciuk commented 3 years ago

@JirongZhang , what are your thoughts on that?

dkoguciuk commented 3 years ago

@JirongZhang ?