aws / sagemaker-huggingface-inference-toolkit

Apache License 2.0
240 stars 60 forks source link

feat(initialize): default to first GPU when gpu_id not provided #125

Open btruhand opened 5 months ago

btruhand commented 5 months ago

Issue #, if available:

I was trying to deploy Huggingface Transformers on Sagemaker with multi-modal-server (MMS) preload_model = true (about preloading). Unfortunately I hit a snag and the server was unable to preload the model due to missing GPU ID

Screenshot 2024-05-31 at 4 09 14 PM

Checking the MMS code here, here, and here we can see that no GPU ID is provided on model preload. Worse, the service will be constructed with no GPU ID and thus on subsequent attempts to initialize on prediction in the handler, the same exception will again be raised

Screenshot 2024-05-31 at 4 20 49 PM

Considering that the existing call already uses .get instead of indexing operator, arguably there was already awareness that gpu_id may be missing, but it was not properly handled. Or it was thought that in subsequent initialization attempts the problem will be fixed

Description of changes:

Provide a default GPU ID of 0, if no gpu_id is provided, indicating downstream code to use the first GPU. I feel like this solution is quite sensible considering that we already check whether GPU is available or not and thus, we should be safe to assume that there is at least 1 GPU with GPU ID 0. Though I'm not entirely well-versed in GPU ID schemes so maybe 0 isn't a universally applicable ID to use

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.