StrongResearch / isc-demos

Deep learning examples for the Instant Super Computer
11 stars 0 forks source link

LAVIS CLIP #66

Open StrongLachlanM opened 10 months ago

StrongLachlanM commented 10 months ago

Source / repo

https://github.com/openai/CLIP

Model description CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

Dataset Train: YFCC15M Eval: Imagenet, Imagenet v2

Literature benchmark source https://arxiv.org/pdf/2103.00020.pdf

Literature benchmark performance

Screenshot 2023-12-05 at 4 25 30 pm

Strong Compute result achieved

[VALUE/S]

Basic training config (as applicable)

Nodes: 12 Epochs: 32 Effective batch size: 9216 Learning rate: 1.5e-3 Optimizer: AdamW

Logs gist

[URL]