chanelcolgate / hydroelectric-project

0 stars 0 forks source link

Model Deployment with TensorFlow Serving #17

Open chanelcolgate opened 3 years ago

chanelcolgate commented 3 years ago

Description

model_path = trainer.outputs.model.get()[0].uri + '/Format-Serving' model = tf.keras.models.load_model(model_path) file_path = "./saved_models/1" tf.keras.models.save_model(model, file_path, save_format="tf")

- Add a Timestamp to Your Export Path: It is recommended to add the timestamp of the export time to the export path for the Keras model when you are manually saving the model.
```python
import tensorflow as tf
import time

model_path = trainer.outputs.model.get()[0].uri + '/Format-Serving'
model = tf.keras.models.load_model(model_path)
ts = int(time.time())
file_path = "./saved_models/{}".format(ts)
tf.keras.models.save_model(model, file_path, save_format="tf")

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | {SUDO_IF_NEEDED} tee /etc/apt/sources.list.d/tensorflow-serving.list && \ curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | {SUDO_IF_NEEDED} apt-key add - !{SUDO_IF_NEEDED} apt update

!{SUDO_IF_NEEDED} apt-get install tensorflow-model-server

- Single Model Configuration: If you want to run TensorFlow Serving by loading a single model and switching to newer model versions when they are available, the single model configuration is preferred.
- You can run it with the command:

%%bash --bg nohup tensorflow_model_server \ --rest_api_port=8501 \ --model_name=my_model \ --model_base_path=/content/saved_models >server.log 2>&1

- By default, TensorFlow Serving is configured to create a representational state transfer (REST) and Google Remote Procedure Calls (gRPC) endpoint. By specifying both ports, 8500 and 8501, we expose the REST and gRPC capabilities. To run the server in a single model configuration, you need to specify the model name `--model_name=`
- By default, TensorFlow Serving will load the model with the highest version number. If you use the export methods shown earlier, all models will be exported in folders with the epoch timestamp as the folder name. Therefore, newer models will have a higher version number than older models.
- Multiple Model Configuration
- You can also configure TensorFlow Serving to load multiple models at the same time. To do that, you need to create a configuration file to specify the models:

model_config_list { config { name: 'my_model' base_path: '/models/my_model/' model_platform: 'tensorflow' } config { name: 'another_model' base_path: '/models/another_model/' model_platform: 'tensorflow' } }

- You can point the model server to the configuration file with the argument `model_config_file`, which loads and the configuration from the file:

$ tensorflow_model_server --port=8500 \ --rest_api_port=8501 \ --model_config_file=/models/model_config

- Configure Specify Model Versions
- There are situations when you want to load not just the latest model version, but either all or specify model versions. If you want to load a set of available model versions, you can extend the model configuration file with:

... config { name: 'another_model' base_path: '/models/another_model/' model_version_policy: {all: {}} } ...

- If you want to specify sepcific model versions, you can define them as well:

... config { name: 'another_model' base_path: '/models/another_model/' model_version_policy: { specific { versions: 1556250435 versions: 1556251435 } } } ...

- You can even give the model version labels. The labels can be extremely handy later when you want to make predictions from the models. At the time of writing, version labels were only available through TensorFlow Serving's gRPC endpoints:

... model_version_policy: { specific { versions: 1556250435 versions: 1556251435 } } version_labels { key: 'stable' value: 1556250435 } version_labels { key: 'testing' value: 1556251435 } ...

- URL structure
- The URL for your HTTP request to the model server contains information about which model and which version you would like to infer: `http://{HOST}:{PORT}/v1/model/{MODEL_NAME}[/version/${MODEL_VERSION}]:{VERB}`
- HOST: The host is the IP address or domain name of your model server. If you run your model server on the same machine where you run your client code, you can set the host to localhost.
- PORT: You'll need to specify the port in your request URL. The standard port for the REST API is 8501. If the conflicts with other services in your service ecosystem, you can change the port in your server arguments during the startup of the server.
- MODEL_NAME: The model name needs to match the name of your model when you either set up your model configuration or started up the model server.
- VERB: The type of model is specified through the verb in the URL. You have three options: predict, classify, or regress. The verb corresponds to the signature methods of the endpoint.
- MODEL_VERSION: If you want to make predictions from a specific model version, you'll need to extend the URL with the model version identifier
- Payloads
- With the URL in place, let's discuss the request payloads. TensorFlow Serving expects the input data as a JSON data structure, as shown in the following example:

{ "signature_name": , "instances": }

- The signature_name is not required. If it isn't specified, the model server will infer the model graph signed with the default serving label.
- The input data is expected either as a list of objects or as a list of input values. To submit multiple data samples, you can submit them as a list under the instances key.
- If you want to submit one data example for the inference, you can use inputs and list all input values as a list. One of the keys, instances and inputs, has to be present, but never both at the same time:

{ "signature_name": , "inputs": }

- Example model prediction request with a Python client
```python
import requests

def get_rest_request(text, model_name='my_model'):
  # Exchange localhost with an IP address if the server is not running on the same machine
  url = "http://localhost:8501/v1/models/{}:predict".format(model_name)
  # Add more examples to the instance list if you want to infer more samples
  payload = {"instances": [text]}
  response = requests.post(url=url, payload=payload)
  return response

rs_rest = get_rest_request(text="classify my text")
rs_rest.json()

def create_grpc_stub(host, port=8500): hostport = "{}:{}".format(host, port) channel = grpc.insecure_channel(hostport) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) return stub

- Once the gRPC stub is created, we can set the model and the signature to access predictions from the correct model and submit our data for the inference:
```python
def grpc_request(stub, data_sample, model_name="my_model", signature_name="classification"):
  request = predict_pb2.PredictRequest()
  request.model_spec.name = model_name
  request.model_spec.signature_name = signature_name

  # inputs is the name of the input of our neural network.
  request.inputs['inputs'].CopyFrom(tf.make_tensor_proto(data_sample,
                                                         shape=[1,1]))
  # 10 is the max time in seconds before the function times out
  result_future = stub.Predict.future(request, 10)
  return result_future

def get_rest_url(model_name, host="localhost", port=8501, verb='predict', version=None): url = "http://{}:{}/v1/models/{}/".format(host, port, model_name) if version: url += "versions/{}".format(version) url += ":{}".format(verb) return url ...

Submit 10% of all requests from this client to version 1.

90% of the requests should go to the default models.

threshold = 0.1

If version = None, TensorFlow Serving will infer with the default version.

version = 1 if random() < threshold else None url = get_rset_url(model_name='complaints_classification', version=version)

- As you can see, randomly changing the request URL for our model inference (in our REST API example), can provide you some basic A/B testing functionality. If you would like to extend these capabilities by performing the random routing of the model inference on the server side, we highly recommend routing tools like [Istio](https://istio.io) for this purpose. Originally designed for web traffic, Istio can be used to route traffic to specific models. You can phase in models, perform A/B tests, or create policies for data routed to specific models.
- Requesting Model Metadata from the Model Server
- The metadata provided by the model server will contain the information to annotate your feedback loops.
- REST Requests for Model Metadata
- Requesting model metadata is straightforward with TensorFlow Serving. TensorFlow Serving provides you an endpoint for model metadata:

http://{HOST}:{PORT}/v1/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/metadata

- Example model metadata request with a python client
```python
import requests

def metadata_rest_request(model_name, host="localhost", port=8501, version=None):
  url = "http://{}:{}/v1/models/{}/".format(host, port, model_name)
  if version:
    url += "versions/{}".format(version)
  # Append /metadata for model information 
  url += "/metadata" 
  # Perform a GET request
  response = requests.get(url=url)
  return response