Loss function for pretraining

google-research / big_transfer

Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.

https://arxiv.org/abs/1912.11370

Apache License 2.0

1.5k stars 175 forks source link

Loss function for pretraining #29

Open kritiagg opened 4 years ago

kritiagg commented 4 years ago

As mentioned in issue [https://github.com/google-research/big_transfer/issues/26] the loss is sigmoid binary cross entropy for each label. I have few more questions about the loss: 1) How is the objects present in the image but label being zero accounted or handled?. For example: there is a picture containing : dog, cat but the label is only dog, and not {cat, animal}. 2) Is there negative sampling done from all label classes for negatives or all the other classes are taken as negative? @lucasb-eyer @akolesnikoff @ebursztein @jessicayung @kolesman

lucasb-eyer commented 4 years ago

Hi, good questions, sorry for the late answer.

It is not accounted for. It happens and this is part of label noise. We actually show in another paper that in such case it seems sigmoid cross entropy per label is beneficial over softmax.
There is no mining, and not even a distinction between positive/negative. We simply always predict (yes/no) for all labels on each image. Also no weighting or anything.

kritiagg commented 4 years ago

Thanks Lucas. I have another question regarding the sampling for the data. Since, the data distribution for BiT-L is heavily tailed. Was the distribution converted to uniform or other to pretrain the model on such a large dataset?