Closed cesarandreslopez closed 1 week ago
Explicitly close the ChatHandler too like this:
chat_handler._exit_stack.close()
Probably it would be a good idea to have that close too when the whole model is closed too, but leaving the suggestion here in case someone encounters the same issue.
Prerequisites
Before submitting, please confirm that:
Expected Behavior
When calling
model.close()
, the VRAM used by both the model and the associated projector model in a ChatHandler (for multimodal models) should be fully released.Current Behavior
When using a multimodal model with a ChatHandler (e.g.,
moondream2
), themodel.close()
method correctly releases the VRAM used by the main model but fails to release the VRAM used by the projector model within the ChatHandler. This results in residual memory usage and eventual exhaustion of VRAM, especially after multiple model loads and closures.Steps to Reproduce
model.close()
:Issue:
model.close()
call releases the VRAM used by the main model but does not release the VRAM occupied by the ChatHandler's projector model.Suggested Fix or Enhancement
The
model.close()
function should ensure that all resources, including those used by the ChatHandler's projector, are properly deallocated.