At the moment the API does not allow you to set the number of GPU layers when you PUT a model. This forces all API-driven applications to run on the CPU, which seems like a bit of an oversight.
I've done some basic testing and got something working, so I'll send a PR over soon.
At the moment the API does not allow you to set the number of GPU layers when you PUT a model. This forces all API-driven applications to run on the CPU, which seems like a bit of an oversight.
I've done some basic testing and got something working, so I'll send a PR over soon.