[WIP] Asynchronous model mover for lowvram

chrismathew99 / automatic1111

GNU Affero General Public License v3.0

0 stars 0 forks source link

What would your feature do ? This is an attempt to speed up --lowvram by taking the model moving out of the forward loop. The model moving is made asynchronous, by creating a separate CUDA stream dedicated for moving the model, and utilizing CUDA event for synchronoizing back to the default stream. A lookahead buffer zone is designed, to make the model moving process faster than the forward phase, so in the meanwhile the GPU always has something to do.

Proposed workflow This is still a prototype, and not all original semantics are followed. CUDA stream and CUDA events are used. They are CUDA specific. I think there are similar things on IPEX, but nothing similar on DML. The size of the lookahead buffer is a tweakable settings. A larger buffer would increase the VRAM usage; a smaller buffer would probably make the forward a bit slower. The generation speed gained by larger buffer has a limit.

🤖 Hey there. Your task doesn't have enough context. Can you please explain in more detail what you want to do?

The task should be broken down into smaller sub-tasks. Start by creating a design document that outlines the proposed changes, the impact on the current architecture, and a step-by-step plan for implementation. This document should be reviewed and discussed with the team. Once the design is agreed upon, create separate tickets for each implementation step, such as setting up the CUDA stream, implementing the lookahead buffer, and integrating these changes with the forward loop.

You can edit the issue's description with more info and I'll run a new analysis on your issue.

chrismathew99 / automatic1111

[WIP] Asynchronous model mover for lowvram #9