Update to reflect the latest api in triton-inference-server's client

babushkai commented 9 months ago

Expected Behavior

get_started_with_triton_ensemble.ipynb sample works without any modification, i.e. get_triton_prediction_vertex at Calling rawPredict using Vertex AI SDK to get prediction response section returns the output .

Actual Behavior

By following the above sample, get_triton_prediction_vertex fails with the following traceback

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[74], line 7
      5 print(f"Predictions from model: {model_name}")
      6 headers = {"x-vertex-ai-triton-redirect": f"v2/models/{model_name}/infer"}
----> 7 get_triton_prediction_vertex(
      8     model_name, endpoint_name, headers=tuple(headers.items())
      9 )
     10 print("-" * 16)

Cell In[73], line 49, in get_triton_prediction_vertex(model_name, endpoint_name, api_endpoint, headers)
     44     triton_output_http = [
     45         triton_http.InferRequestedOutput(output_name, binary_data=False)
     46     ]
     48 # create inference request
---> 49 _data, _ = triton_http._get_inference_request(
     50     inputs=[triton_input_http],
     51     outputs=triton_output_http,
     52     request_id="",
     53     sequence_id=0,
     54     sequence_start=False,
     55     sequence_end=False,
     56     priority=0,
     57     timeout=None,
     58 )
     59 http_body = httpbody_pb2.HttpBody(
     60     data=_data.encode("utf-8"), content_type="application/json"
     61 )
     62 print(f"request: {data}")

AttributeError: module 'tritonclient.http' has no attribute '_get_inference_request'

Workaround

_get_inference_request is under _utils.py file now in triton-inference-server's client ref.
The request looks to be now encoded by default ref.
Add custom_parameters as is required

By applying the above changes, the request looks be sent without issue. The below is the updated function used for this purpose.

def get_triton_prediction_vertex(
    model_name,
    endpoint_name,
    api_endpoint=f"{REGION}-aiplatform.googleapis.com",
    headers=None,
):
    # set up vertex ai prediction client
    client_options = {"api_endpoint": api_endpoint}
    gapic_client = gapic.PredictionServiceClient(client_options=client_options)

    # generate example data to classify
    features = 4
    samples = 1
    data = np.random.rand(samples, features).astype("float32")

    # payload configuration defining input and output names
    payload_config = {
        "sci_1": {"input": "input__0", "output": "output__0"},
        "sci_2": {"input": "input__0", "output": "output__0"},
        "xgb": {"input": "input__0", "output": "output__0"},
        "tf": {"input": "dense_input", "output": "round"},
        "ensemble": {"input": "INPUT0", "output": "OUTPUT0"},
        "mux": {
            "input": "mux_in",
            "output": ["mux_xgb_out", "mux_tf_out", "mux_sci_1_out", "mux_sci_2_out"],
        },
    }

    # get input and output names based on model name
    input_name = payload_config[model_name]["input"]
    output_name = payload_config[model_name]["output"]

    # set up Triton input and output objects for HTTP
    triton_input_http = triton_http.InferInput(input_name, (samples, features), "FP32")
    triton_input_http.set_data_from_numpy(data, binary_data=False)

    if isinstance(output_name, list):
        triton_output_http = [
            triton_http.InferRequestedOutput(output, binary_data=False)
            for output in output_name
        ]

    else:
        triton_output_http = [
            triton_http.InferRequestedOutput(output_name, binary_data=False)
        ]

    # create inference request
    _data, _ = triton_http._utils._get_inference_request(
        inputs=[triton_input_http],
        outputs=triton_output_http,
        request_id="",
        sequence_id=0,
        sequence_start=False,
        sequence_end=False,
        priority=0,
        timeout=None,
        custom_parameters=None
    )
    http_body = httpbody_pb2.HttpBody(
        data=_data, content_type="application/json"
    )
    print(f"request: {data}")
    # submit inference request
    request = gapic.RawPredictRequest(endpoint=endpoint_name, http_body=http_body)
    response = gapic_client.raw_predict(request=request, metadata=headers)
    # get result as json
    result_http = json.loads(response.data.decode("utf-8"))
    print(f"response: {result_http['outputs'][0]['data']}")

inference

Steps to Reproduce the Problem

Clone https://github.com/GoogleCloudPlatform/vertex-ai-samples/
Move to /notebooks/community/vertex_endpoints/nvidia-triton/get_started_with_triton_ensemble.ipynb
Fill out the requested parameter in Google Cloud Platform and execute the notebook as-is