OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
626 stars 49 forks source link

Fix GPU memory leak in training loop #43

Closed mutichung closed 7 months ago

mutichung commented 7 months ago

First of all, thanks for the great work!

Problem Description

I observed GPU memory leaks when running OmniQuant's training loop. The numbers below are based on running LWC on the Llama-2-13b model with 128 calibration samples, batch 1, and 2048 sequence length.

GPU memory starts at ~15 gigs at the beginning of the first epoch of the first decoder layer. Memory consumption gradually increases as the calibration samples are processed. At the end of the first epoch, GPU memory peaks at ~20 gigs and stays until the end of the training loop.

Fix

After the fix, running the same experiment setup yields a maximum GPU memory usage of ~15 gigs, which is almost the same as where it starts.

Again, thanks for the work!

ChenMnZ commented 7 months ago

Thanks for your proposal. I have tested it; this is a very meaningful modification!