CQ for OIM Loss - Githubissues

dichen-cd commented 7 years ago

Hi there~

I see that the OIM Loss is a little different to that described in the paper. This implementation omitted the Circular Queue(CQ) part. May I ask what's the consideration behind this change?

Thank you.

Cysu commented 7 years ago

Because traditional person re-identification datasets don't have unlabeled identities, which opposes the person search setting.

dichen-cd commented 7 years ago

Then I suppose OIM would be almost the same as Softmax+Xentropy 'cuz the external buffer (LUT) serves as a projection matrix from feature space to class probability space. The only differences between OIM and Softmax+Xentropy is 1)the update strategy and 2)the L2 normalization of LUT/projection matrix.

Is this assumption right? If so, then why does OIM perform so much better than Softmax+Xentropy?

Cysu commented 7 years ago

Yes. It's correct! The reason why OIM is better than softmax loss for verification is probably that

OIM does not have 2048 x 5000 = 10,240,000 parameters to learn, which avoids overfitting.
OIM can utilize many background people that have no IDs, while softmax loss cannot.

dichen-cd commented 7 years ago

Thank you for the quick response!

Excellent work on OIM and this repo!

dichen-cd commented 7 years ago

Hi @Cysu . Sorry to bother you again 😭 But I've got problems adding circular queue to oim. Could you have a look at it if it's convenient?

Code is here: https://gist.github.com/DeanChan/5e33d66425862e8a318dcfdb4ca98cc4

It's basically based on your code. But I found that it could not converge during training. The loss is always wandering around 28.0~31.0. I tried for several days and couldn't find the reason. Hope you could enlighten me. 😄

Cysu commented 7 years ago

The weight here is used to reweight classes in cross entropy loss. It should be simply self.weight = weight.
At this line, it should be x.view(1, -1).
Actually I wouldn't recommend appending a new item and constructing a new CQ each time. It's inefficient. You may consider using a header to indicate at which index we should put the new item. Something like
```
self.cq[self.header] = x
self.header = (self.header + 1) % self.cq.size(0)
```

dichen-cd commented 7 years ago

Many thanks for your response!

To the first point, I thought the probability of x belonging to some unlabeled classes should be filtered out. 'Cuz based on the loss function in the paper, $\mathcal{L} = \mathbb{E}_x[\log p_t]$, and $q_i$ is not included. So I think the weight of unlabeled classes should be set to zero. Correct me if I was wrong.

The result of x.view(1, -1) is the same as x.view(-1, x.size(0)). But your code is much more neat!

Awesome on the third point! That one's really clever!

Cysu commented 7 years ago

Ok, I got it. You assign a random label in [L, L+U) for unlabeled samples. I think it's correct. Could you please double check if the targets are in this range?

dichen-cd commented 7 years ago

Yes, I've checked. Targets are right in this range. LUT and CQ can be updated correctly as well.

The dataloader is adapted from psdb.py. Dataset is the same and no augmentation is applied. The annotations returned by __getitem__ method of the dataloader would be self.targets_db[index]. And if there's -1 in self.targets_db[index]['gt_pids'], it would be replaced with a random integer between [L, L+U).

ps. I think there's minor error at the comment here psdb.py line 81. Back ground people would have pid == -1 instead of 0.

Cysu commented 7 years ago

Thanks for point out the error in comments! Will fix it later.

May I know a little bit more your experiment settings? Do you mean to use the ground truth boxes of our person search dataset for re-id? Could you please try to comment out the CQ code and use only the labeled identities? And see if it works properly in such case?

dichen-cd commented 7 years ago

Nope, it's not just for re-id, but the same as the person search setting in your paper. I'm currently training detection and Identification jointly with a bbox regression loss and oim loss.

The regression loss is going down fine at all settings.
If replace oim loss with Softmax Loss, then both two losses are going down nicely.
If provide labeled samples only, then oim loss is not going down either.
If drop the cq part in oim and provide labeled samples only, the problem is the same.

While the example code in this repository with an oim loss works really fine.

I've tried several initialization scheme for lut and cq, the problem is the same. So it won't be initializer's fault.

Cysu commented 7 years ago

OK. I got it. In our caffe's implementation, the proposal target layer would produce target -1 for unlabeled, [0, L-1] for labeled, and L for false detections (background regions without person). At last when computing the cross entropy loss, we set -1 for both the unlabeled and false detections, and have ignore_label: -1 in the loss layer.

I wonder if these three types of bounding boxes proposals are handled properly?

dichen-cd commented 7 years ago

Ha, I see!

In my implementation, I set 0 for false detections, [1, L] for labeled person and [L+1, L+U] for unlabeled. In the cross entropy loss, only labels between [L+1, L+U] are ignored. Maybe I have to ignore label 0 too.

Many thanks for your advice! I'll let you know if it works.

Cysu commented 7 years ago

I think it's better to make the target of labeled persons start from zero. Otherwise the indexing of the LUT should be carefully changed.

dichen-cd commented 7 years ago

Nice suggestion! I'll fix it. 😄

dichen-cd commented 7 years ago

Pity to see it's not working either. 😢

Loss drops from (28.0, 31.0) to (1.0, 6.0), yet still wandering and not going down at all.

dichen-cd commented 7 years ago

Hi Cysu~ Finally I got the solution. After changing --oim-scalar from the default value 1.0 to 100.0, oim loss can go down as expected. I suppose it's because the L2 normalization opration decreases gradient drastically, thus we can solve this problem by simply multiplying a large scalar.

Thanks again for your help!!

Cysu commented 7 years ago

Oh. That's it. The oim-scalar is very important. Maybe 100 is too large. We use 10 in our caffe's implementation. You can tweak and experiment with it.

By the way, sometimes reducing its value along the training epochs also improves the performance. It's called annealing in some literature.

zhongyingji commented 5 years ago

Hi, I've got a problem when training with the oim. I set --oim-scalar to 100, and the later epochs, the oimloss becomes nan, I get no idea on what's happening. BTW, what's the range of oimloss at the end of training? Thank you!

haochange commented 5 years ago

Hi, I've got a problem when training with the oim. I set --oim-scalar to 100, and the later epochs, the oimloss becomes nan, I get no idea on what's happening. BTW, what's the range of oimloss at the end of training? Thank you!

also feel confused at the range of oimloss.

Cysu / open-reid

CQ for OIM Loss #3