Microbatching & Parallelism

cfoster0 commented 3 years ago

CLIP's objective involves a symmetric cross entropy loss between the representstions of the text and the images (spectrograms, in our case) in a batch. It benefits from very large batch sizes (OpenAI did 32k). There are tricks we can do with microbatching to fit very large batch sizes.

You can compute the embeddings for your text with no_grad, and cache them. Then you compute the embeddings for spectograms in microbatches, computing their classification loss with the full set of texts. As you go along, cache those computed spectogram embeddings. Lastly, you do the other side of the loss, holding the spectogram embeddings constant and computing the loss for text embeddings in microbatches.

Some example code that implements a partial version of that is here:

https://gist.github.com/crowsonkb/a93904fbb88aff0302aac98dfdb26b5f#file-clip_coco_2-py-L182

cfoster0 commented 3 years ago

Pull request #11 added a basic CLIP objective. What remains is to implement the microbatching tricks.

cfoster0 commented 3 years ago

After a long period of dormancy, I've spent some time figuring this component out. The notebook below implements microbatching and parallelism for the contrastive loss in a dummy CLIP setup. Will work to incorporate this into the codebase in the coming weeks, and get us back on track!

https://colab.research.google.com/drive/1SzlPD4ptVtfINIfTt3lGyHwazvZumv9a?usp=sharing

cfoster0 / CLAP

Microbatching & Parallelism #10