Closed lchiquitto closed 2 weeks ago
We're loading models from all configured providers at startup, using all GPU memory available.
Loading the model should be delayed to just before the provider run, not in the provider's init() method.
Each provider should probably implement load_model() and unload_model().
Implement methods in commit
We're loading models from all configured providers at startup, using all GPU memory available.
Loading the model should be delayed to just before the provider run, not in the provider's init() method.
Each provider should probably implement load_model() and unload_model().