ockaro commented 6 months ago

Describe the bug a clear and concise description of what the bug is.

Hi there, it seems that when adding a pytorch model to the self-hosted clearml-serving the platform also needs to be added. But neither specifying the platform with the aux-config flag, nor passing a config.pbtxt file with the aux-config flag works. In both cases I get an error

E0405 14:55:27.962135 35 model_repository_manager.cc:996] Poll failed for model directory 'advanced_basic_classifier.pytorch': unexpected 'platform' and 'backend' pair, got:, pytorch

What's your helm version?

3.14.3

What's your kubectl version?

1.25.2

What's the chart version?

7.8.1

Enter the changed values of values.yaml?

-- ClearMl generic configurations

clearml: apiAccessKey: apiSecretKey: apiHost: https://api.***.com filesHost: https://files.***.com webHost: https://app.***.com servingTaskId: "fec7d23cc2b848b48d15041ce965ed81"

-- ClearML serving inference configurations

clearml_serving_inference:

-- Ingress exposing configurations

ingress:
    enabled: true
    hostName: "serving.***.com"
    ingressClassName: "nginx"
    tlsSecretName: "ingress-tls-clearml-serving"
    annotations:
        cert-manager.io/cluster-issuer: letsencrypt-production

jkhenning commented 6 months ago

Hi @ockaro,

Can you please verify there is a clearml triton pod running? Also, how did you register the model? What was the clearml-serving command you used?

Side note - can you please move the issue to clearml-serving repo? This seems not to be related to the helm chart, but to the serving itself.

ockaro commented 5 months ago

Hi @jkhenning , thanks for moving the issue! Btw I never had an issue registering a model using clearml pro together with the docker container setup. I wasn't even required to name the model 'model.' which is the case with the k8s setup, because I get this error in the triton pod if I don't Invalid model name: Could not determine backend for model 'advanced_basic_classifier' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

The triton pod is up and running. The mentioned error message occurs within the triton pod.

I tried these two commands in order to register the model clearml-serving --id ad16b8ae3e2840c1b1b6eb94bbcf78f4 model add --engine triton --endpoint "advanced_basic_classifier.pytorch" --preprocess "src/preprocessing/preprocess.py" --model-id 837276fc8d8a443fb91f48d722300b0a --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config platform=pytorch_libtorch

clearml-serving --id ad16b8ae3e2840c1b1b6eb94bbcf78f4 model add --engine triton --endpoint "advanced_basic_classifier.pytorch" --preprocess "src/preprocessing/preprocess.py" --model-id 837276fc8d8a443fb91f48d722300b0a --aux-config .\config.pbtxt

where the config.pbtxt looks like this

  backend: "pytorch"
  platform: "pytorch_libtorch"
  input [
    {
      name: "INPUT__0"
      data_type: TYPE_FP32
      dims: [1, 64]
    }
  ]
  output [
    {
      name: "OUTPUT__0"
      data_type: TYPE_FP32
      dims: [1, 11]
    }
  ]