SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.33k stars 827 forks source link

V2 Triton Server: Deployment of a model with custom execution environment via PVC #4617

Open Niklas2501 opened 1 year ago

Niklas2501 commented 1 year ago

Describe the bug

Can't copy a custom execution environment file to the Triton server when trying to deploy a model via a PVC. How to set the EXECUTION_ENV_PATH parameter in the config.pbtxt?

To reproduce

Hey, I'm trying to deploy a python model to a Triton servery via a PVC

  1. For this im following this tutorial, with the adaptation of using deploying a Triton instead of a ML-server.
  2. I can successfully deploy and use one of the Triton example models, to be specific add20
  3. For the actual model i want to deploy i need some additional packages, so i followed the Triton tutorial on how to add a custom execution environment

Now I can't figure out how to set the EXECUTION_ENV_PATH parameter in the config.pbtxt such that the custom execution environment file (e.g. conda-pack.tar.gz) can be copied by rclone and is also detected by the triton server?

As always, thank you very much for your help!

Expected behaviour

The custom execution environment file should be copied to the server and the model should be deployed successfully.

Environment

Proivder: Rancher Desktop Architekture: ARM64 / M1 Mac Kubernetes Cluster Version: Client Version: v1.25.2 Kustomize Version: v4.5.7 Server Version: v1.25.4+k3s1 Deployed Seldon System Images: image: docker.io/seldonio/seldonv2-controller:latest

Model Details

name: "add20"
backend: "python"

input [
  {
    name: "INPUT"
    data_type: TYPE_FP32
    dims: [ 4 ]
  }
]
output [
  {
    name: "OUTPUT"
    data_type: TYPE_FP32
    dims: [ 4 ]
  }
]

instance_group [{ kind: KIND_CPU }]

parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "???/conda-pack.tar.gz"}
}

model.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: add20
  namespace: seldon-mesh
spec:
  storageUri: "/var/models/add20"
  requirements:
  - python
  - pvc

server.yaml

apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: triton-pvc
  namespace: seldon-mesh
spec:
  serverConfig: triton
  extraCapabilities:
  - "pvc"
  podSpec:
    volumes:
    - name: models-pvc
      persistentVolumeClaim:
        claimName: ml-models-pvc
    containers:
    - name: rclone
      volumeMounts:
      - name: models-pvc
        mountPath: /var/models
ukclivecox commented 1 year ago

Can you try modifying the server resource further to add the needed envVar?, e.g. the triton container (for example) this adds requests but you could also add an envar.

Niklas2501 commented 1 year ago

Can you try modifying the server resource further to add the needed envVar?, e.g. the triton container (for example) this adds requests but you could also add an envar.

Can you elaborate what i should add as a env var?

I tried to use "/var/models/add20/conda-pack.tar.gz", i.e. the path in the PV, similar to to the model URI but rclone responded with an error message, see screenshot. image

ukclivecox commented 1 year ago

Can you check the triton container - is this an error from triton when its trying to use the EXECUTION_ENV_PATH? Here it says local folders are permitted: https://github.com/triton-inference-server/python_backend#important-notes. Can you verify the envVar has been added to the triton pod? If not can you show yoru latest server.yaml?

Niklas2501 commented 1 year ago

Before referring to your specific questions: In the link you posted the docs stated to used this to set the EXECUTION_ENV_PATH like this if it is in the model directory: parameters: { key: "EXECUTION_ENV_PATH", value: {string_value: "$$TRITON_MODEL_DIRECTORY/conda-pack.tar.gz"} } but this also leads to a "crash" (going into not ready) of the triton server with the same "describe model" output:

Spec:
  Requirements:
    python
    pvc
  Storage Uri:  /var/models/add20
Status:
  Conditions:
    Last Transition Time:  2023-01-24T10:51:00Z
    Reason:                rpc error: code = InvalidArgument desc = load failed for model 'add20_1': version 1 is at UNAVAILABLE state: Internal: Failed to get the canonical path for /mnt/agent/models/add20_1/conda-pack.tar.gz.;

    Status:                False
    Type:                  ModelReady
    Last Transition Time:  2023-01-24T10:51:00Z
    Reason:                rpc error: code = InvalidArgument desc = load failed for model 'add20_1': version 1 is at UNAVAILABLE state: Internal: Failed to get the canonical path for /mnt/agent/models/add20_1/conda-pack.tar.gz.;

    Status:  False
    Type:    Ready
  Replicas:  1
Events:      <none>

This is the corresponding log of the triton container:

I0124 10:51:00.308943 1 grpc_server.cc:270] Process for RepositoryModelLoad, rpc_ok=1, 0 step START
I0124 10:51:00.308983 1 grpc_server.cc:225] Ready for RPC 'RepositoryModelLoad', 1
I0124 10:51:00.311559 1 model_config_utils.cc:646] Server side auto-completed config: name: "add20_1"
input {
  name: "INPUT"
  data_type: TYPE_FP32
  dims: 4
}
output {
  name: "OUTPUT"
  data_type: TYPE_FP32
  dims: 4
}
instance_group {
  kind: KIND_CPU
}
default_model_filename: "model.py"
parameters {
  key: "EXECUTION_ENV_PATH"
  value {
    string_value: "$$TRITON_MODEL_DIRECTORY/conda-pack.tar.gz"
  }
}
backend: "python"

I0124 10:51:00.312534 1 model_lifecycle.cc:459] loading: add20_1:1
I0124 10:51:00.313322 1 backend_model.cc:308] Adding default backend config setting: default-max-batch-size,4
I0124 10:51:00.313369 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so
I0124 10:51:00.325310 1 python_be.cc:1612] 'python' TRITONBACKEND API version: 1.10
I0124 10:51:00.325347 1 python_be.cc:1634] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","shm-default-byte-size":"16777216","default-max-batch-size":"4"}}
I0124 10:51:00.325392 1 python_be.cc:1764] Shared memory configuration is shm-default-byte-size=16777216,shm-growth-byte-size=67108864,stub-timeout-seconds=30
I0124 10:51:00.325480 1 python_be.cc:2010] TRITONBACKEND_GetBackendAttribute: setting attributes
I0124 10:51:00.325930 1 python_be.cc:1812] TRITONBACKEND_ModelInitialize: add20_1 (version 1)
I0124 10:51:00.327353 1 model_config_utils.cc:1838] ModelConfig 64-bit fields:
I0124 10:51:00.327361 1 model_config_utils.cc:1840]   ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0124 10:51:00.327363 1 model_config_utils.cc:1840]   ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0124 10:51:00.327365 1 model_config_utils.cc:1840]   ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0124 10:51:00.327367 1 model_config_utils.cc:1840]   ModelConfig::ensemble_scheduling::step::model_version
I0124 10:51:00.327368 1 model_config_utils.cc:1840]   ModelConfig::input::dims
I0124 10:51:00.327370 1 model_config_utils.cc:1840]   ModelConfig::input::reshape::shape
I0124 10:51:00.327372 1 model_config_utils.cc:1840]   ModelConfig::instance_group::secondary_devices::device_id
I0124 10:51:00.327373 1 model_config_utils.cc:1840]   ModelConfig::model_warmup::inputs::value::dims
I0124 10:51:00.327375 1 model_config_utils.cc:1840]   ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0124 10:51:00.327377 1 model_config_utils.cc:1840]   ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0124 10:51:00.327379 1 model_config_utils.cc:1840]   ModelConfig::output::dims
I0124 10:51:00.327381 1 model_config_utils.cc:1840]   ModelConfig::output::reshape::shape
I0124 10:51:00.327383 1 model_config_utils.cc:1840]   ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0124 10:51:00.327386 1 model_config_utils.cc:1840]   ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0124 10:51:00.327388 1 model_config_utils.cc:1840]   ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0124 10:51:00.327390 1 model_config_utils.cc:1840]   ModelConfig::sequence_batching::state::dims
I0124 10:51:00.327392 1 model_config_utils.cc:1840]   ModelConfig::sequence_batching::state::initial_state::dims
I0124 10:51:00.327393 1 model_config_utils.cc:1840]   ModelConfig::version_policy::specific::versions
I0124 10:51:00.329043 1 python_be.cc:1503] Using Python execution env /mnt/agent/models/add20_1/conda-pack.tar.gz
I0124 10:51:00.329268 1 python_be.cc:1835] TRITONBACKEND_ModelFinalize: delete model state
E0124 10:51:00.329277 1 model_lifecycle.cc:597] failed to load 'add20_1' version 1: Internal: Failed to get the canonical path for /mnt/agent/models/add20_1/conda-pack.tar.gz.
I0124 10:51:00.330481 1 grpc_server.cc:270] Process for RepositoryModelLoad, rpc_ok=1, 0 step WRITEREADY
I0124 10:51:00.330604 1 grpc_server.cc:270] Process for RepositoryModelLoad, rpc_ok=1, 0 step COMPLETE
I0124 10:51:00.330617 1 grpc_server.cc:411] Done for RepositoryModelLoad, 0
I0124 10:51:04.463551 1 http_server.cc:3372] HTTP request: 0 /v2/health/ready
I0124 10:51:09.463787 1 http_server.cc:3372] HTTP request: 0 /v2/health/ready
I0124 10:51:09.463948 1 http_server.cc:3372] HTTP request: 0 /v2/health/live

Log of the agent container:

time="2023-01-24T10:50:59Z" level=info msg="Received operation" Name=Client
time="2023-01-24T10:50:59Z" level=info msg="calling load model" Name=Client
time="2023-01-24T10:50:59Z" level=info msg="Load model add20:1" Name=Client func=LoadModel
time="2023-01-24T10:50:59Z" level=debug msg="running with model add20_1:1 srcUri /var/models/add20" Name=V2ModelRepository func=DownloadModelVersion
time="2023-01-24T10:50:59Z" level=info msg="Copy from /var/models/add20 (original /var/models/add20) to /mnt/agent/rclone/3735392888" Source=RCloneClient
time="2023-01-24T10:50:59Z" level=info msg="Calling Rclone server: /sync/copy with {\"srcFs\":\"/var/models/add20\",\"dstFs\":\"/mnt/agent/rclone/3735392888\",\"createEmptySrcDirs\":true}" Source=RCloneClient
time="2023-01-24T10:51:00Z" level=info msg="rclone response: {}\n" Source=RCloneClient
time="2023-01-24T10:51:00Z" level=debug msg="Found model add20_1:1 artifactVersion 0 for /var/models/add20 at /mnt/agent/rclone/3735392888/1 " Name=V2ModelRepository func=DownloadModelVersion
time="2023-01-24T10:51:00Z" level=info msg="Calling Rclone server: /operations/purge with {\"fs\":\"/mnt/agent/rclone/3735392888\",\"remote\":\"\"}" Source=RCloneClient
time="2023-01-24T10:51:00Z" level=info msg="rclone response: {}\n" Source=RCloneClient
time="2023-01-24T10:51:00Z" level=info msg="Chose path /mnt/agent/rclone/3735392888/1 for model add20:1" Name=Client func=LoadModel
time="2023-01-24T10:51:00Z" level=debug msg="Loading model add20_1" Source=StateManager
time="2023-01-24T10:51:00Z" level=debug msg="model: add20_1, avail: 1073741824, required 0" Source=StateManager
time="2023-01-24T10:51:00Z" level=debug msg="Before memory update 1073741824, 0" Source=StateManager
time="2023-01-24T10:51:00Z" level=debug msg="After memory update 1073741824, 0" Source=StateManager
time="2023-01-24T10:51:00Z" level=debug msg="Before memory update 1073741824, 0" Source=StateManager
time="2023-01-24T10:51:00Z" level=debug msg="After memory update 1073741824, 0" Source=StateManager
time="2023-01-24T10:51:00Z" level=error msg="Failed to handle load model add20:1" Name=Client error="rpc error: code = InvalidArgument desc = load failed for model 'add20_1': version 1 is at UNAVAILABLE state: Internal: Failed to get the canonical path for /mnt/agent/models/add20_1/conda-pack.tar.gz.;\n"
2023/01/24 10:51:00 max retry time elapsed: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup seldon-collector on 10.43.0.10:53: server misbehaving"

It seems like adding EXECUTION_ENV_PATH as env var in the server.yaml also did not help - same problem. (Whitespace added by me)

NPP_VERSION=11.8.0.86
SHELL=/bin/bash
KUBERNETES_SERVICE_PORT_HTTPS=443
NVIDIA_VISIBLE_DEVICES=all
DALI_BUILD=5920076
SELDON_MESH_PORT_9003_TCP_PORT=9003
KUBERNETES_SERVICE_PORT=443
SELDON_MESH_PORT_80_TCP_PROTO=tcp
CUSOLVER_VERSION=11.4.1.48
SELDON_MESH_PORT_9003_TCP=tcp://10.43.220.150:9003
SELDON_SCHEDULER_SERVICE_HOST=10.43.251.151
SELDON_SCHEDULER_PORT_9055_TCP_PROTO=tcp
CUBLAS_VERSION=11.11.3.6
HOSTNAME=triton-pvc-0
DCGM_VERSION=2.2.9
SERVER_MODELS_DIR=/mnt/agent/models
SELDON_SCHEDULER_PORT_9005_TCP=tcp://10.43.251.151:9005
CUFFT_VERSION=10.9.0.58
NVIDIA_REQUIRE_CUDA=cuda>=9.0
SELDON_SCHEDULER_SERVICE_PORT_DATAFLOW=9008
SELDON_SCHEDULER_PORT_9008_TCP_ADDR=10.43.251.151
SELDON_SCHEDULER_PORT_9055_TCP_PORT=9055
SELDON_SCHEDULER_PORT_9004_TCP_ADDR=10.43.251.151
CUDA_CACHE_DISABLE=1
SELDON_SCHEDULER_PORT_9004_TCP=tcp://10.43.251.151:9004
NCCL_VERSION=2.15.5
SELDON_SCHEDULER_SERVICE_PORT_AGENT_MTLS=9055
CUSPARSE_VERSION=11.7.5.86
ENV=/etc/shinit_v2
PWD=/opt/tritonserver
SELDON_SCHEDULER_PORT_9002_TCP_PORT=9002
OPENUCX_VERSION=1.14.0
SELDON_SCHEDULER_PORT_9002_TCP=tcp://10.43.251.151:9002
NSIGHT_SYSTEMS_VERSION=2022.4.2.1
NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
POLYGRAPHY_VERSION=0.42.1
TF_ENABLE_WINOGRAD_NONFUSED=1
TRT_VERSION=8.5.1.7
SELDON_SCHEDULER_PORT_9055_TCP_ADDR=10.43.251.151
SELDON_SCHEDULER_PORT_9044_TCP_PORT=9044
SELDON_SCHEDULER_PORT=tcp://10.43.251.151:9002
NVIDIA_PRODUCT_NAME=Triton Server
RDMACORE_VERSION=36.0
SELDON_SCHEDULER_SERVICE_PORT_XDS=9002
SELDON_SCHEDULER_PORT_9044_TCP_ADDR=10.43.251.151
HOME=/home/triton-server
SELDON_SCHEDULER_PORT_9005_TCP_PORT=9005
KUBERNETES_PORT_443_TCP=tcp://10.43.0.1:443
CUDA_VERSION=11.8.0.065
SELDON_SCHEDULER_SERVICE_PORT_SCHEDULER_MTLS=9044
SELDON_SCHEDULER_PORT_9044_TCP=tcp://10.43.251.151:9044
CURAND_VERSION=10.3.0.86
SELDON_SCHEDULER_PORT_9004_TCP_PROTO=tcp
SELDON_SCHEDULER_PORT_9055_TCP=tcp://10.43.251.151:9055
SELDON_MESH_SERVICE_PORT_DATA=80
CUTENSOR_VERSION=1.6.1.5
TRITON_SERVER_GPU_ENABLED=1
HPCX_VERSION=2.12.2tp1
SELDON_MESH_SERVICE_PORT_ADMIN=9003
SELDON_SCHEDULER_SERVICE_PORT_AGENT=9005
SELDON_SCHEDULER_SERVICE_PORT=9002
SERVER_GRPC_PORT=9500
SELDON_SCHEDULER_SERVICE_PORT_SCHEDULER=9004
SELDON_SCHEDULER_PORT_9008_TCP=tcp://10.43.251.151:9008
TERM=xterm-256color
TRITON_SERVER_VERSION=2.28.0
SELDON_MESH_PORT_80_TCP_PORT=80
GDRCOPY_VERSION=2.3
OPENMPI_VERSION=4.1.4
NVJPEG_VERSION=11.9.0.86
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
SELDON_SCHEDULER_PORT_9002_TCP_PROTO=tcp
SELDON_MESH_PORT_9003_TCP_PROTO=tcp
SHLVL=2
BASH_ENV=/etc/bash.bashrc
KUBERNETES_PORT_443_TCP_PROTO=tcp
TF_AUTOTUNE_THRESHOLD=2
SELDON_MESH_PORT=tcp://10.43.220.150:80
CUDNN_VERSION=8.7.0.80
KUBERNETES_PORT_443_TCP_ADDR=10.43.0.1
NSIGHT_COMPUTE_VERSION=2022.3.0.22
SELDON_MESH_PORT_9003_TCP_ADDR=10.43.220.150
SELDON_SCHEDULER_PORT_9004_TCP_PORT=9004
DALI_VERSION=1.18.0
NVIDIA_TRITON_SERVER_VERSION=22.11
LD_LIBRARY_PATH=/opt/tritonserver/backends/onnxruntime:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64

EXECUTION_ENV_PATH=$TRITON_MODEL_DIRECTORY/conda-pack.tar.gz

NVIDIA_BUILD_ID=48581224
SELDON_SCHEDULER_PORT_9005_TCP_PROTO=tcp
OMPI_MCA_coll_hcoll_enable=0
SELDON_SCHEDULER_PORT_9008_TCP_PROTO=tcp
OPAL_PREFIX=/opt/hpcx/ompi
KUBERNETES_SERVICE_HOST=10.43.0.1
CUDA_DRIVER_VERSION=520.61.05
SELDON_SCHEDULER_PORT_9044_TCP_PROTO=tcp
SELDON_MESH_PORT_80_TCP=tcp://10.43.220.150:80
KUBERNETES_PORT=tcp://10.43.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
_CUDA_COMPAT_PATH=/usr/local/cuda/compat
SELDON_SCHEDULER_PORT_9002_TCP_ADDR=10.43.251.151
NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=
PATH=/opt/tritonserver/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin
TRITON_SERVER_USER=triton-server
SELDON_MESH_PORT_80_TCP_ADDR=10.43.220.150
SELDON_SCHEDULER_PORT_9005_TCP_ADDR=10.43.251.151
MOFED_VERSION=5.4-rdmacore36.0
TRTOSS_VERSION=22.10
SELDON_MESH_SERVICE_HOST=10.43.220.150
DEBIAN_FRONTEND=noninteractive
SELDON_MESH_SERVICE_PORT=80
TF_ADJUST_HUE_FUSED=1
TF_ADJUST_SATURATION_FUSED=1
TEST=test
SERVER_HTTP_PORT=9000
SELDON_SCHEDULER_PORT_9008_TCP_PORT=9008
_=/usr/bin/env

server.yaml used, with the env var added:

apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: triton-pvc
  namespace: seldon-mesh
spec:
  serverConfig: triton
  extraCapabilities:
  - "pvc"
  podSpec:
    volumes:
    - name: models-pvc
      persistentVolumeClaim:
        claimName: ml-models-pvc
    containers:
    - name: rclone
      volumeMounts:
      - name: models-pvc
        mountPath: /var/models
    - name: triton
      env:
      - name: TEST
        value: "test"
      - name: EXECUTION_ENV_PATH
        value: "$$TRITON_MODEL_DIRECTORY/conda-pack.tar.gz"
ukclivecox commented 1 year ago

Should you not try?:

  - name: EXECUTION_ENV_PATH
        value: "/var/models/conda-pack.tar.gz"

As said in their docs: If a non-$$TRITON_MODEL_DIRECTORY EXECUTION_ENV_PATH is used, only local file system paths are currently supported.

Niklas2501 commented 1 year ago

I don't think that would make sense. This would mean the triton server should look for the file at (triton-container-root)/var/models/conda-pack.tar.gz. I don't see a way the file would end up there.

While trying to find the root cause of the problem i noticed that unlike the other model files (i.e. the config.pbtxt and the model.py) the custom execution environment file is not copied / mounted to the same location by rclone

Content of the pv:

image

Content of the model directory in the triton container: image

For normal model files rclone copies/mounts them from (pv)/var/models/add20 to (triton-container-root)/mnt/agent/models/add20_1/

Based on this, we should also see a file (triton-container-root)/mnt/agent/models/add20_1/conda-pack.tar.gz

As one can see not only the custom execution environment file is not copied but also some other test files i included.

So i think there is an issue at the stage of copying/mounting the model files from the PV to the triton container and not an issue of the triton server not looking for the model at the correct location. Or am i missing something?

Might there be some kind of a filter or whitelist which files are copied by rclone that leads to conda-pack.tar.gz not being copied?

ukclivecox commented 1 year ago

OK this makes sense as the issue. At present the logic is to copy the version folder and then just copy the config.pbtxt. But this seems a bug for above reason. Instead we should copy all files from top level folder and the version folder found.

So you also don't need the envVar as this is model specific and need to update the config.pbtxt as discussed in Triton docs.

We will need to look into a PR to change how files are copied across.

Niklas2501 commented 1 year ago

Ok, thanks for your update and of course for you patient help!