Open antonaleks opened 8 months ago
i fixed this problem by custom seldon deployment without TRITON_SERVER implementation
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: multi
namespace: seldon-triton
spec:
predictors:
- annotations:
seldon.io/no-engine: "false"
# prometheus.io/scrape: 'true'
# prometheus.io/path: '/metrics'
# prometheus.io/port: '6000'
# seldon.io/engine-metrics-prometheus-path: "/metrics"
# seldon.io/engine-metrics-prometheus-port: "6000"
componentSpecs:
- spec:
containers:
- name: multi
image: nvcr.io/nvidia/tritonserver:23.10-py3
args:
- /opt/tritonserver/bin/tritonserver
- '--grpc-port=9500'
- '--http-port=9000'
- '--metrics-port=6000'
- '--model-repository=/mnt/models'
ports:
- name: grpc
containerPort: 9500
protocol: TCP
- name: http
containerPort: 9000
protocol: TCP
- name: triton-metrics
containerPort: 6000
protocol: TCP
resources:
limits:
nvidia.com/gpu: 1
securityContext:
capabilities:
add: [ "SYS_ADMIN" ] # for DCGM
graph:
logger:
mode: all
modelUri: gs://seldon-models/triton/multi
name: multi
name: default
replicas: 1
protocol: v2
But question about TRITON_SERVER implementation is still opened
https://github.com/SeldonIO/seldon-core/blob/8e1d98d03f15a70808a8035c110b443c15e28a96/operator/controllers/seldondeployment_prepackaged_servers.go#L239C1-L240C1 may be really you could look at this line)
Describe the bug
I can not expose triton metrics in deployment - i put ports dsecribtion at Pod.v1 spec and use Triton implementation, but metrics ports can not be recognized.
Triton server has metrics only on /metrics endpoint, not on /prometheus. May be i can change MLSERVER_METRICS_ENDPOINT env?
To reproduce
There are no Metrics endpoint in Deploymnent! -->
Expected behaviour
Deployment has endpoint with metrics in 8002 port
Environment
kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio
] value: docker.io/seldonio/seldon-core-executor:1.17.1 image: docker.io/seldonio/seldon-core-operator:1.17.1