LAVIS BLIP - Githubissues

StrongTanisha commented 10 months ago

Source / repo

Model description

VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilises the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones.

Dataset

Train: COCO + VG + SBU + CC3m + CC12m Eval: NoCaps

Literature benchmark source

https://arxiv.org/pdf/2201.1

Literature benchmark performance

Strong Compute result achieved

[VALUE/S]

Basic training config (as applicable)

Nodes: 12 Epochs: 20 Effective batch size: 2880 Learning rate: Variable (default config) Optimizer: AdamW

Logs gist

[URL]

StrongTanisha commented 10 months ago

LAVIS - base model no implementable - you do have to select one

All similar, all implement base class. Some don't fit into 24GB GPU (so Calvin helping with sharding)

Blip - (Blip 2.7BN parameters, too big for GPU). Blip 1 can work though. Lachlan working on that now

StrongTanisha commented 10 months ago

Now cycling. Checking on validating a full cycle, but looking promising. Next is to establish a benchmark and assess performance

StrongTanisha commented 10 months ago

For the initial demo for blip, the way they evaluate uses the wrong thing - supposed to use a base class (base task) but it's using something else
Making changes to the captioning task instead (to then make use of the base class)
Currently on 2 nodes on staging

StrongResearch / isc-demos

LAVIS BLIP #56

Source / repo

Model description

Dataset

Literature benchmark source

Literature benchmark performance

Strong Compute result achieved

Basic training config (as applicable)

Logs gist