Open StrongTanisha opened 1 year ago
LAVIS - base model no implementable - you do have to select one
All similar, all implement base class. Some don't fit into 24GB GPU (so Calvin helping with sharding)
Blip - (Blip 2.7BN parameters, too big for GPU). Blip 1 can work though. Lachlan working on that now
Now cycling. Checking on validating a full cycle, but looking promising. Next is to establish a benchmark and assess performance
Source / repo
https://github.com/salesforce/LAVIS
Model description
VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilises the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones.
Dataset
Train: COCO + VG + SBU + CC3m + CC12m Eval: NoCaps
Literature benchmark source
https://arxiv.org/pdf/2201.1
Literature benchmark performance
Strong Compute result achieved
[VALUE/S]
Basic training config (as applicable)
Nodes: 12 Epochs: 20 Effective batch size: 2880 Learning rate: Variable (default config) Optimizer: AdamW
Logs gist
[URL]