This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.61k
stars
2.82k
forks
source link
Need GPU (cuda) access while deploying the model #30554
I need assistance with deploying a pre-trained model. I have created a custom score.py file for the deployment process. However, the docker created on the CPU instance does not provide access to the GPU, which poses a problem for predicting with PyTorch or TensorFlow models as they require input to be converted to tensors loaded on the GPU. Can you suggest a solution?
My score.py script -
import something
# original = torch.load
# def load(*args):
# return torch.load(*args, map_location=torch.device("cpu"),pickle_module=None)
# def init():
# global model
# model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "use-case1-model")
# # "model" is the path of the mlflow artifacts when the model was registered. For automl
# # models, this is generally "mlflow-model".
# with mock.patch("torch.load", load):
# model = mlflow.pyfunc.load_model(model_path)
# logging.info("Init complete")
def init():
global model
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "use-case1-model")
model = mlflow.pytorch.load_model(model_path, map_location=torch.device('cpu'))
logging.info("Init complete")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
def run(data):
json_data = json.loads(data)
title = json_data["input_data"]["title"]
att = json_data["input_data"]["attributes"]
result = {}
for i in range(len(title)):
my_dict = {}
for j in range(len(att)):
attr = att[i][j]
t, a = nobert4token(tokenizer, title[i].lower(), attr)
x = X_padding(t)
y = tag_padding(a)
tensor_a = torch.tensor(y, dtype=torch.int32)
tensor_a = torch.unsqueeze(tensor_a, dim=0).to("cuda")
tensor_t = torch.tensor(x, dtype=torch.int32)
tensor_t = torch.unsqueeze(tensor_t, dim=0).to("cuda")
output = model([tensor_t, tensor_a])
predict_list = output.tolist()[0]
my_dict[attr] = " ".join(words_p)
result[title[i]] = my_dict
return result
127.0.0.1 - - [29/May/2023:10:03:32 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.18"
2023-05-29 10:03:34,291 E [70] azmlinfsrv - Encountered Exception: Traceback (most recent call last):
File "/azureml-envs/azureml_d587e0800be72e17d773ddca63762cd1/lib/python3.8/site-packages/azureml_inference_server_http/server/user_script.py", line 130, in invoke_run
run_output = self._wrapped_user_run(**run_parameters, request_headers=dict(request.headers))
File "/azureml-envs/azureml_d587e0800be72e17d773ddca63762cd1/lib/python3.8/site-packages/azureml_inference_server_http/server/user_script.py", line 154, in <lambda>
self._wrapped_user_run = lambda request_headers, **kwargs: self._user_run(**kwargs)
File "/var/azureml-app/dependencies/score.py", line 129, in run
tensor_a = torch.unsqueeze(tensor_a, dim=0).to("cuda")
File "/azureml-envs/azureml_d587e0800be72e17d773ddca63762cd1/lib/python3.8/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
The above exception was the direct cause of the following exception:
If you think why I used "model = mlflow.pytorch.load_model(model_path, map_location=torch.device('cpu'))"
I need assistance with deploying a pre-trained model. I have created a custom score.py file for the deployment process. However, the docker created on the CPU instance does not provide access to the GPU, which poses a problem for predicting with PyTorch or TensorFlow models as they require input to be converted to tensors loaded on the GPU. Can you suggest a solution?
My score.py script -
My invoke script-
My conda.yaml-
Error that I am getting -
If you think why I used "model = mlflow.pytorch.load_model(model_path, map_location=torch.device('cpu'))"
please refer to this forum- https://learn.microsoft.com/en-us/answers/questions/1291498/facing-problem-while-deploying-model-on-azure-ml-a
Documentation - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models-online-endpoints?view=azureml-api-2&tabs=sdk