Reproducing experiments of fig. 3

tsirif commented 1 year ago

Hello there again,

I am trying to reproduce the experiments of figure 3 in my own codebase, but I have not been successful in reproducing the results of the case of assigning temperatures according to whether a sample belongs in the head (\tau=1.0) or tail (\tau=0.1). Is there anything I need to take special consideration about? Can you recommend me how I can reproduce these experiments in the current code base?

Let me know if I get the following correct:

The temperature for the infoNCE loss is selected according to the anchor’s status in the head/tail.
Head anchor samples get a higher temperature than tail anchor samples.
The selected temperature influences the denominator of the loss. Does it also influence the nominator? I paid attention to the last section of the appendix, and I didn’t understand at the end if you are using the selected/oracle temperature for the positive pairs as well.
Use MoCO to accumulate negative representations from a momentum network.
Use batch normalization in the projection head.

Annusha commented 1 year ago

yeah, the temperature influences both the nominator and the denominator. So the same temperature is applied to all pairs (positive and negative) based on the class to which the positive belongs. For moco, there is no BN in the projection head, only one linear layer.

tsirif commented 1 year ago

Do you have any plans on releasing the code for this particular experiments? I am having hard time reproducing its results…

Annusha commented 1 year ago

Not really. I might have time to do that at the end of the year only.

Could you try to verify it on training data? Fig. 3 solely refers to making training data learning a better structure in the embedding. Getting the same/similar results on test data is not so straightforward if possible at all.
As at some moment we were targeting deep clustering; therefore, in evaluation, this experiment corresponds to a similar setting.

To improve results on the test data, you could try to play around with different temperatures for positives and negatives, tail-tail, head-tail, tail-head, head-head pairs. But it again referees to still having "high-level" supervision into head and tail classes. For supervised training, I didn't find any benefits in having a schedule for the temperature.

Annusha / temperature_schedules

Reproducing experiments of fig. 3 #4