Closed ecm200 closed 1 year ago
@bmartinn in #3 makes reference to an issue raised with PyTorch, which looks like it might be the potential source of the issue.
pytorch/pytorch#47917
Investigating now and will report back.
The issue here was the way the model was being saved.
The input for the Triton Inference server is a PyTorch model that as been saved using the TorchScript export utility. I had been using a checkpointed file saved by PyTorch Ignite checkpointer, which was just a set of weights of that could be loaded into a model.
In order to get the model loaded by the Triton Inference Server, the model needs to be converted into TorchScript. This can be achieved by first building your model and then loading the weights into the model, as if you were going to perform inference. This can be achieved in the following way:
Build you PyTorch model object as before.
Load weights using:
model.load_state_dict(torch.load(f=checkpoint_file))
First generate an example input tensor for the model, this can be just an array of random numbers of the right input size, or a batch from a dataloader, it doesn't matter. Then use the torch.jit.trace
method to create a traced module of the model. This traced module object is still able to take inputs and produce model outputs, however it is now executing as Torchscript code through the Torchscript C API rather than the python torch package. The save() method of the traced module can then be used to save the model to disk. It is this file that you provide to the Triton Inference Server for deployment.
# Get a validation batch
X, y = next(iter(val_loader))
# Set the model into eval mode
model.eval()
# Push input images to gpu
X = X.to(device)
# Trace the model
traced_module = torch.jit.trace(model, (X))
# Save the trace model module to disk ready for deployment
traced_module.save('model.pt'))
Using this model.pt
file with the Triton Inference Server has allowed me to get inference working.
The config.pbtxt file was as follows:
name: "cub200_resnet34"
platform: "pytorch_libtorch"
input [
{
name: "INPUT__0"
data_type: TYPE_FP32
dims: -1
dims: 3
dims: 224
dims: 224
}
]
output [
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: -1
dims: 200
}
]
The following docker command (manually started for testing) was run to start the Triton server:
docker run --gpus=1 --rm --ipc=host -p8000:8000 -p8001:8001 -p8002:8002 -v/home/edmorris/models_repo:/models nvcr.io/nvidia/tritonserver:21.03-py3 tritonserver --model-repository=/models
The successful model serving looks like this:
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0611 14:14:37.897200 1 server.cc:527]
+-------------+-----------------------------------------------------------------+--------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+--------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} |
| openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} |
+-------------+-----------------------------------------------------------------+--------+
I0611 14:14:37.897278 1 server.cc:570]
+-----------------+---------+--------+
| Model | Version | Status |
+-----------------+---------+--------+
| cub200_resnet34 | 1 | READY |
+-----------------+---------+--------+
I0611 14:14:37.897359 1 tritonserver.cc:1658]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.8.0 |
| server_extensions | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
I0611 14:14:37.898756 1 grpc_server.cc:3983] Started GRPCInferenceService at 0.0.0.0:8001
I0611 14:14:37.898976 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
I0611 14:14:37.940972 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002
To test the inference was running correctly, I created a python script to make a dataloader, build the model from the checkpoint file in PyTorch, and execute both a python based inference and a inference served to the Triton server. The resulting class predictions were then compared.
This snippet is not totally complete, you need to create the model and a dataloader to serve batch images.
import argparse
import numpy as np
import sys
from functools import partial
import os
from tritonclient import grpc
import tritonclient.grpc.model_config_pb2 as mc
from tritonclient import http
from tritonclient.utils import triton_to_np_dtype
from tritonclient.utils import InferenceServerException
import torch
from clearml import InputModel, Task
import shutil
import pathlib
def run_inference(X, X_shape=(3, 224, 224), X_dtype='FP32', model_name='cub200_resnet34', input_name=['INPUT__0'], output_name='OUTPUT__0',
url='ecm-clearml-compute-gpu-002.westeurope.cloudapp.azure.com', model_version='1', port=8000, VERBOSE=False):
url = url+':'+str(port)
triton_client = http.InferenceServerClient(url=url, verbose=VERBOSE)
model_metadata = triton_client.get_model_metadata(model_name=model_name, model_version=model_version)
model_config = triton_client.get_model_config(model_name=model_name, model_version=model_version)
input0 = http.InferInput(input_name[0], X_shape, X_dtype)
input0.set_data_from_numpy(X, binary_data=False)
output = http.InferRequestedOutput(output_name, binary_data=False)
response = triton_client.infer(model_name, model_version=model_version, inputs=[input0], outputs=[output])
y_pred_proba = response.as_numpy(output_name)
y_pred = y_pred_proba.argmax(1)
return y_pred_proba, y_pred
# Get a validation batch
X, y = next(iter(val_loader))
# Set the model into eval mode
model.eval()
# Push input images to gpu
X_gpu = X.to(device)
# Run inference on validatgion batch image
y_prob_pred = model(X_gpu)
# Get predicted classes
_, y_pred = torch.max(y_prob_pred, 1)
# Get Triton served predicted classes
y_pred_proba_remote, y_pred_remote = run_inference(X.numpy(), X.shape)
print('Result:: \ty\t\t:: {} \n\t \ty_pred[local]\t:: {} \n\t \ty_pred[triton]\t:: {} '.format(y.numpy(),y_pred.cpu().numpy(),y_pred_remote))
print('')
What is the best way to add the Torchscript model to the experiment?
Would it be through the use of the OutputModel
class?
What combination of calls would be best to use to upload the model?
Is it ok to have more than 1 output model associated with an experiment? [I think it is, but I just wanted to be sure].
I am thinking at the moment, the Ignite ClearMLSaver() function handles the saving of the model checkpoint and its uploading to the clearml-server
, with it being pushed to remote Azure storage.
How do I ensure that the location of the storage is the same? Is it autogenerated?
Basically, what I would like achieve is pushing the Torchscript exported model to same folder location as the PyTorch model weights, and thus having both those files organized together.
What is the best way to add the Torchscript model to the experiment? Would it be through the use of the OutputModel class?
Hmm, I think a straight forward solution would be to convert the model.pt at the end of the training process, then use OutputModel to store it.
# conversion code here
final_model = OutputModel()
final_model.update_weights('final_model_here.pt', auto_delete_file=True)
Is it ok to have more than 1 output model associated with an experiment? [I think it is, but I just wanted to be sure].
It is fully supported. Notice that with clearml-server
v1.0+ this is also visible on the Task Artifact and Models tab as well as inside the model repository.
How do I ensure that the location of the storage is the same? Is it autogenerated? Basically, what I would like achieve is pushing the Torchscript exported model to same folder location as the PyTorch model weights, and thus having both those files organized together.
If Task.init
was called with output_uri
(or default_output_uri
configured in clearml.conf
), then the OutputModel will automatically upload the weights file to the Azure storage into the Tasks' unique folder, right next to the other weights files. Do notice the file name should be unique, to avoid overwriting previous checkpoints :)
This function created a Torchscript version of my image classification model, added a model artefact to the experiment and created a new model object on the clearml-server
, and uploaded the file to the experiment directory on the remote storage service.
This Torchscript model could then be used following the Triton serving example for clearml-serving
, to be able to deploy and serve that model as a remote end point for inteference over HTTP.
Note, this snippet requires you to build a PyTorch model object and load the checkpoint weights of the best model from the model training, as well as a dataloader object for serving images for the model tracing [or just create tensors with random initialisation of the expected input size].
def trace_model_for_torchscript(self, dirname=None, fname=None, model_name_preamble=None):
'''
Function for tracing models to Torchscript.
'''
assert self.trainer_status['model'], '[ERROR] You must create the model to load the weights. Use Trainer.create_model() method to first create your model, then load weights.'
assert self.trainer_status['val_loader'], '[ERROR] You must create the validation loader in order to load images. Use Trainer.create_dataloaders() method to create access to image batches.'
if model_name_preamble is None:
model_name_preamble = 'Torchscript Best Model'
if dirname is None:
dirname = tempfile.mkdtemp(prefix=f"ignite_torchscripts_{datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S_')}")
temp_file_path = os.path.join(dirname,'model.pt')
# Get the best model weights file for this experiment
for chkpnt_model in self.task.get_models()['output']:
print('[INFO] Model Found. Model Name:: {0}'.format(chkpnt_model.name))
print('[INFO] Model Found. Mode URI:: {0}'.format(chkpnt_model.url))
if "best_model" in chkpnt_model.name:
print('[INFO] Using this model weights for creating Torchscript model.')
break
# Get the model weights file locally and update the model
local_cache_path = chkpnt_model.get_local_copy()
self.update_model_from_checkpoint(checkpoint_file=local_cache_path)
# Create an image batch
X, _ = next(iter(self.val_loader))
# Push the input images to the device
X = X.to(self.device)
# Trace the model
traced_module = torch.jit.trace(self.model, (X))
# Write the trace module of the model to disk
print('[INFO] Torchscript file being saved to temporary location:: {}'.format(temp_file_path))
traced_module.save(temp_file_path) ### TODO: Need to work out where this is saved, and how to push to an artefact.
# Build the remote location of the torchscript file, based on the best model weights
# Create furl object of existing model weights
model_furl = furl.furl(chkpnt_model.url)
# Strip off the model path
model_path = pathlib.Path(model_furl.pathstr)
# Get the existing model weights name, and split the name from the file extension.
file_split = os.path.splitext(model_path.name)
# Create the torchscript filename
if fname is None:
fname = file_split[0]+"_torchscript"+file_split[1]
# Construct the new full uri with the new filename
new_model_furl = furl.furl(origin=model_furl.origin, path=os.path.join(model_path.parent,fname))
# Upload the torchscript model file to the clearml-server
print('[INFO] Pushing Torchscript model as artefact to ClearML Task:: {}'.format(self.task.id))
new_output_model = OutputModel(
task=self.task,
name=model_name_preamble+' '+self.task.name,
tags=['Torchscript','Deployable','Best Model', 'CUB200', self.config.MODEL.MODEL_NAME, self.config.MODEL.MODEL_LIBRARY, 'PyTorch', 'Ignite', 'Azure Blob Storage']
)
print('[INFO] New Torchscript model artefact added to experiment with name:: {}'.format(new_output_model.name))
print('[INFO] Torchscript model local temporary file location:: {}'.format(temp_file_path))
print('[INFO] Torchscript model file remote location:: {}'.format(new_model_furl.url))
new_output_model.update_weights(
weights_filename=temp_file_path,
target_filename=fname
)
print('[INFO] Torchscript model file remote upload complete. Model saved to ID:: {}'.format(new_output_model.id))
Closing this issue as inactive, feel free to open a new issue and link to this one.
The Triton server is now able to find the local copy of the model weight pt file and attempts to serve it, following fixes in #3.
The following error occurs when the model is served by the Triton Inference server:
Originally posted by @ecm200 in https://github.com/allegroai/clearml-serving/issues/3#issuecomment-858868722