What's the purpose of interleave in mixmatch.py

google-research / mixmatch

Apache License 2.0

1.13k stars 163 forks source link

What's the purpose of interleave in mixmatch.py #5

Closed bl0 closed 5 years ago

bl0 commented 5 years ago

Hi authors, thanks very much for the amazing work. I wonder what's the purpose of interleave function in the mixmatch.py. I tried to remove this, but the performance is very bad. I can't find any description about this in the github repo or papers.

https://github.com/google-research/mixmatch/blob/9096b685ceae4bbdae8d342619ad56a8cb992d1c/mixmatch.py#L71-L78

bl0 commented 5 years ago

Also, I can't understand the role of post_ops. I know that only the ops in post_ops can be evaluated, such that BN is updated only for some special cases, in other cases BN will be frozen, but for what cases?

Thanks very much.

david-berthelot commented 5 years ago

Interleave is simply forming batches of items that come from both labeled and unlabeled batches. Since we only update batch norm for the first batch, it's important that this batch is representative of the whole data.
BN is only updated for the first batch.
Post ops contains only the operations to perform after the gradient update: update batch norm, do weight decay, update moving average weights.

bl0 commented 5 years ago

Thanks for your reply.

bl0 commented 5 years ago

If the BN is updated for all batches, what will happen?

david-berthelot commented 5 years ago

I am not sure, typically in fully supervised learning one only runs one batch and thus updates batch norm only once.

bl0 commented 5 years ago

Thanks very much.

happygds commented 4 years ago

@bl0 did you try update BN for all batches, what do you observe ?

bl0 commented 4 years ago

The results are bad. So maybe we should be careful about the BN.

moskomule commented 4 years ago

Hi, I didn't know this trick. As @bl0, I also found that this interleaving avoids performance drop quite well. Are there any references for this?

david-berthelot commented 4 years ago

I'm not sure about references. There are many ways to train with batch norm (for example, one could make a giant batch of everything), I simply chose that solution (interleaving) because I was considering doing multi-GPU and I wanted a homogeneous batch.

zhaozhengChen commented 4 years ago

Hi authors, In mixmatch.py, you forward three batches(one labeled and two unlabeled) separately and use interleave to get a batch which can represent the whole data, why we cannot forward them together? I tried to forward them together but got a bad performance.

david-berthelot commented 4 years ago

I don't know. It could be the way you made change introduced a bug, it could be that there it introduces a different behavior but I don't remember if it is the case: this research was done a year ago. Please update this thread if you find the reason so others can benefit.