Thytu / SMIT

SMIT: A Simple Modality Integration Tool
MIT License
15 stars 3 forks source link

SMIT default example should be GPU-poor friendly #12

Closed Thytu closed 5 months ago

Thytu commented 5 months ago

Once #8 is merged the default example will consume a considerable amount of VRAM (~77Go) which prevents many potential users to test SMIT.

SMIT has been designed to be GPU-poor friendly from the beginning and its default example should showcase it.

There is still plenty of room for improvement to reduce VRAM usage, some of are:

For more ideas: Methods and tools for efficient training on a single GPU

Thytu commented 5 months ago

More information on optimizer's impact over VRAM usage

image

Thytu commented 5 months ago

Using adamw_bnb_8bit as optimizer + quantizing the decoder to 4bits seems to be a good option. I just need to find the right values for batch size (BS) and gradient accumulation (GA).

image image
Thytu commented 5 months ago

Batch size of 2 with gradient accumulation of 4 works but takes ~400mins (~6h) to converge which is way too long. Trying to increase BS by 1

image image
Thytu commented 5 months ago

Increasing BS to 3 resulting in a VRAM above 40GB (~46GB). I'll probably release it with Quant nf4 BS2 GA4.