KevinMusgrave / pytorch-metric-learning

The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

https://kevinmusgrave.github.io/pytorch-metric-learning/

MIT License

5.99k stars 658 forks source link

Fine tune the model using TripletLoss and TripletMarginMiner. #268

Closed FightStone closed 3 years ago

FightStone commented 3 years ago

Thanks for the open source work. When I use TripletLoss and TripletMarginMiner to fine tune the model which obtained by proxy anchor loss. In most cases, the loss is always 0. Is this normal? May I ask what caused it? Looking forward to your reply

KevinMusgrave commented 3 years ago

Yes it's normal. If all the triplets in a batch satisfy the margin, then the loss will be 0.

To be precise:

let d_an be the anchor-negative distance
let d_ap be the anchor-positive distance

If ||d_an - d_ap|| > margin for all triplets in the batch, then the loss will be 0.

So the model you trained with the proxy anchor loss is already separating the data by at least the margin.

FightStone commented 3 years ago

Thanks for your reply. In the way, it is depended batchsize which you used in tripletloss . If it is too small, loss will be zero. is it correct?------------------ 原始邮件 ------------------ 发件人: "Kevin Musgrave"<notifications@github.com> 发送时间: 2021年1月24日(星期天) 凌晨1:09 收件人: "KevinMusgrave/pytorch-metric-learning"<pytorch-metric-learning@noreply.github.com>; 抄送: "FightStone"<864345476@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [KevinMusgrave/pytorch-metric-learning] Fine tune the model using TripletLoss and TripletMarginMiner. (#268)

KevinMusgrave commented 3 years ago

Yes, as you decrease the batch size, you get fewer triplets in each batch, so you are more likely to get 0 loss.

FightStone commented 3 years ago

Thanks for your prompt reply. Do you have any suggestions on the size of batchsize and the value of miner's margin.------------------ 原始邮件 ------------------ 发件人: "Kevin Musgrave"<notifications@github.com> 发送时间: 2021年1月24日(星期天) 上午10:14 收件人: "KevinMusgrave/pytorch-metric-learning"<pytorch-metric-learning@noreply.github.com>; 抄送: "FightStone"<864345476@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [KevinMusgrave/pytorch-metric-learning] Fine tune the model using TripletLoss and TripletMarginMiner. (#268)

KevinMusgrave commented 3 years ago

Try the loss function by itself, without any miner.
If your model is underfitting, then increase the triplet loss margin. The default is 0.05, so you could try 0.1 or something bigger: TripletMarginLoss(margin=0.1)
If your model is overfitting, then decrease the margin.
My intuition is that you should use as large a batch size as will fit in memory. However, I don't know if this will actually improve performance.
After you get some experimental data from using the loss by itself, you could try adding the miner. You might also be interested in ThresholdReducer. Here's how you initialize and use a reducer:
```
from pytorch_metric_learning.losses import TripletMarginLoss
from pytorch_metric_learning.reducers import ThresholdReducer
```

This will keep only the loss values that are greater than 0.1

reducer = ThresholdReducer(low=0.1) loss_fn = TripletMarginLoss(reducer=reducer)

KevinMusgrave commented 3 years ago

By the way, I have found that ContrastiveLoss usually performs better than TripletMarginLoss

FightStone commented 3 years ago

Thank you for your explanation! Excuse my ignorance, what is the difference between TripletLoss(which i use)and TripletMarginLoss?

KevinMusgrave commented 3 years ago

There is no TripletLoss in this library, or maybe I'm getting confused 😄

FightStone commented 3 years ago

hahaha. sorry，I made a mistake. Thank you for your contribution.

FightStone commented 3 years ago

Sorry to bother you again. Does TripletMarginMiner traverse all the triplet matches in the batch to find easy, hard or semihard case?

KevinMusgrave commented 3 years ago

Yes

FightStone commented 3 years ago

OK, In a batch, the number of triples found by miner may be 0 or any number. Then transfer these triples to the loss function for calculation and backpropagation. I think the batchsize is between 0 and any number, but the learning rate is constant. Will this affect the performance of the algorithm?

KevinMusgrave commented 3 years ago

Yes the output of the miner will affect performance because the loss function will operate only on the output of the miner. So if there are 0 triplets mined, then the loss is automatically 0.