StrongResearch / isc-demos

Deep learning examples for the Instant Super Computer
11 stars 0 forks source link

LAVIS BLIP #56

Open StrongTanisha opened 10 months ago

StrongTanisha commented 10 months ago

Source / repo

https://github.com/salesforce/LAVIS

Model description

VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilises the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones.

Dataset

Train: COCO + VG + SBU + CC3m + CC12m Eval: NoCaps

Literature benchmark source

https://arxiv.org/pdf/2201.1

Literature benchmark performance

Screenshot 2023-12-05 at 4 15 13 pm

Strong Compute result achieved

[VALUE/S]

Basic training config (as applicable)

Nodes: 12 Epochs: 20 Effective batch size: 2880 Learning rate: Variable (default config) Optimizer: AdamW

Logs gist

[URL]

StrongTanisha commented 10 months ago

LAVIS - base model no implementable - you do have to select one

All similar, all implement base class. Some don't fit into 24GB GPU (so Calvin helping with sharding)

Blip - (Blip 2.7BN parameters, too big for GPU). Blip 1 can work though. Lachlan working on that now

StrongTanisha commented 10 months ago

Now cycling. Checking on validating a full cycle, but looking promising. Next is to establish a benchmark and assess performance

StrongTanisha commented 10 months ago