chrismathew99 / automatic1111

GNU Affero General Public License v3.0
0 stars 0 forks source link

Sweep: [WIP] Asynchronous model mover for lowvram #5

Open chrismathew99 opened 8 months ago

chrismathew99 commented 8 months ago

Is there an existing issue for this?

What would your feature do ?

This is an attempt to speed up --lowvram by taking the model moving out of the forward loop. The model moving is made asynchronous, by creating a separate CUDA stream dedicated for moving the model, and utilizing CUDA event for synchronoizing back to the default stream. A lookahead buffer zone is designed, to make the model moving process faster than the forward phase, so in the meanwhile the GPU always has something to do.

Proposed workflow

This is still a prototype, and not all original semantics are followed. CUDA stream and CUDA events are used. They are CUDA specific. I think there are similar things on IPEX, but nothing similar on DML. The size of the lookahead buffer is a tweakable settings. A larger buffer would increase the VRAM usage; a smaller buffer would probably make the forward a bit slower. The generation speed gained by larger buffer has a limit.

Additional information

No response