I was trying to deploy Huggingface Transformers on Sagemaker with multi-modal-server (MMS) preload_model = true (about preloading). Unfortunately I hit a snag and the server was unable to preload the model due to missing GPU ID
Checking the MMS code here, here, and here we can see that no GPU ID is provided on model preload. Worse, the service will be constructed with no GPU ID and thus on subsequent attempts to initialize on prediction in the handler, the same exception will again be raised
Considering that the existing call already uses .get instead of indexing operator, arguably there was already awareness that gpu_id may be missing, but it was not properly handled. Or it was thought that in subsequent initialization attempts the problem will be fixed
Description of changes:
Provide a default GPU ID of 0, if no gpu_id is provided, indicating downstream code to use the first GPU. I feel like this solution is quite sensible considering that we already check whether GPU is available or not and thus, we should be safe to assume that there is at least 1 GPU with GPU ID 0. Though I'm not entirely well-versed in GPU ID schemes so maybe 0 isn't a universally applicable ID to use
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
I was trying to deploy Huggingface Transformers on Sagemaker with multi-modal-server (MMS)
preload_model = true
(about preloading). Unfortunately I hit a snag and the server was unable to preload the model due to missing GPU IDChecking the MMS code here, here, and here we can see that no GPU ID is provided on model preload. Worse, the
service
will be constructed with no GPU ID and thus on subsequent attempts to initialize on prediction in the handler, the same exception will again be raisedConsidering that the existing call already uses
.get
instead of indexing operator, arguably there was already awareness thatgpu_id
may be missing, but it was not properly handled. Or it was thought that in subsequent initialization attempts the problem will be fixedDescription of changes:
Provide a default GPU ID of 0, if no
gpu_id
is provided, indicating downstream code to use the first GPU. I feel like this solution is quite sensible considering that we already check whether GPU is available or not and thus, we should be safe to assume that there is at least 1 GPU with GPU ID 0. Though I'm not entirely well-versed in GPU ID schemes so maybe 0 isn't a universally applicable ID to useBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.