Closed CaptnSeraph closed 1 year ago
Yes, this is an unfortunate limitation. Depending on where ecaxtly OOM occurs, there’s still some room to save memory. Caching of hidden states currently does not use batching and simply passes all feedback images in the same batch.
If OOM occurs during the forward pass with feedback, I think sliced attention is the only way to improve memory usage. Are you using some kind of attention optimization already?
Yeah I had that sticker shock of CUDA mem request for 24GB on my poor 8GB
VRAM usage should now be improved especially with --opt-sliced-attention
. Should no longer give OOM for reasonable batch size / resolution / number of feedback images.
do you mean opt-split-attention?
Yeah mb, that’s exactly what i mean
Might not be an issue so much as an unfortunate effect of the trick, but i had my vram usage double when using dislike, and then triple when adding liked.
12GB vram seems to be not quite enough for 4 images in fabric and then generating a batch size of 2.