inference service - Githubissues

hello, i am trying to get the inference service from this example https://docs.coreweave.com/compass/examples/pytorch-hugging-face-diffusers-stable-diffusion-text-to-image running.

this is my yaml

apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: stable-diffusion4
spec:

  predictor:

    containerConcurrency: 1
    minReplicas: 0
    maxReplicas: 5
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: gpu.nvidia.com/class
              operator: In
              values:
              - RTX_A5000
            - key: topology.kubernetes.io/region
              operator: In
              values:
              - ORD1 

    containers:
      - name: kfserving-container # kserve-container
        image: pkurzend/stable-diffusion-inference-test:test-3
        env:
          - name: STORAGE_URI # Kserve mounts the PVC at /mnt/models/  --> readonly
            value: pvc://stable-diffusion-model-cache/
            # The following env vars are the default model parameters, which can be changed as needed.
          - name: HF_HOME
            value: /mnt/models/

        resources:
          requests:
            cpu: 6
            memory: 32Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 6
            memory: 32Gi
            nvidia.com/gpu: 1

The bottom of my service.py file looks like this (i dont really understand if the name here will be part of the api enpoint ?l)

if __name__ == "__main__":
    model = Model(name='stable-diffusion-inference')
    model.load()
    kserve.ModelServer().start([model])

The command kubectl logs -l serving.kubeflow.org/inferenceservice=stable-diffusion4 --container kfserving-container shows the following logs

The tokenizer class you load from this checkpoint is 'CLIPTokenizer'.
The class this function is called from is 'CLIPTokenizerWithEmbeddings'.
[I 230115 12:55:13 service:58] Loaded stable-diffusion-inference
[I 230115 12:55:13 service:60] Loading stable-diffusion-inference to accelerator
[I 230115 12:55:14 service:62] Accelerator loaded
[I 230115 12:55:20 service:81] Textual Inversion Embeddings loaded
[I 230115 12:55:20 model_server:150] Registering model: stable-diffusion-inference
[I 230115 12:55:20 model_server:123] Listening on port 8080
[I 230115 12:55:20 model_server:125] Will fork 1 workers
[I 230115 12:55:20 model_server:128] Setting max asyncio worker threads as 10

The command kubectl get isvc returns the following:

NAME                URL                                                                     READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                         AGE
stable-diffusion4   https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com   True           100                              stable-diffusion4-predictor-default-00001   98m

kubectl get pods yields the following

NAME                                                              READY   STATUS    RESTARTS   AGE
stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv   0/2     Pending   0          1s
virt-launcher-stabel-diffusion-mrhcf                              1/1     Running   0          13m

kubectl describe pod stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv yields the following

Name:                 stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv
Namespace:            tenant-84e585-dev
Priority:             1000000
Priority Class Name:  normal
Service Account:      default
Node:                 g74174e/10.135.27.47
Start Time:           Sun, 15 Jan 2023 12:53:57 +0000
Labels:               app=stable-diffusion4-predictor-default-00001
                      component=predictor
                      pod-template-hash=ddbc59b7f
                      service.istio.io/canonical-name=stable-diffusion4-predictor-default
                      service.istio.io/canonical-revision=stable-diffusion4-predictor-default-00001
                      serving.knative.dev/configuration=stable-diffusion4-predictor-default
                      serving.knative.dev/configurationGeneration=1
                      serving.knative.dev/configurationUID=f99af4f5-f12e-436e-a3a6-cabafaa1a233
                      serving.knative.dev/revision=stable-diffusion4-predictor-default-00001
                      serving.knative.dev/revisionUID=0a29a24c-241e-413d-96d1-d70b1ca915fd
                      serving.knative.dev/service=stable-diffusion4-predictor-default
                      serving.knative.dev/serviceUID=867a7c27-4619-4d08-9200-5613cf5b2718
                      serving.kubeflow.org/inferenceservice=stable-diffusion4
Annotations:          autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
                      autoscaling.knative.dev/maxScale: 5
                      autoscaling.knative.dev/minScale: 0
                      cni.projectcalico.org/containerID: 6a8801031564d7a7e51bd3f899be8b04fe14b37e99d8d35f27b7638f3bf9ec44
                      cni.projectcalico.org/podIP: 10.144.240.49/32
                      cni.projectcalico.org/podIPs: 10.144.240.49/32
                      container.apparmor.security.beta.kubernetes.io/kfserving-container: runtime/default
                      container.apparmor.security.beta.kubernetes.io/queue-proxy: runtime/default
                      container.apparmor.security.beta.kubernetes.io/storage-initializer: runtime/default
                      internal.serving.kubeflow.org/storage-initializer-sourceuri: pvc://stable-diffusion-model-cache/
                      kubernetes.io/psp: restricted
                      lxcfs-admission-webhook.aliyun.com/status: mutated
                      seccomp.security.alpha.kubernetes.io/pod: docker/default
                      serving.coreweave.cloud/static: true
                      serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
Status:               Running
IP:                   10.144.240.49
IPs:
  IP:           10.144.240.49
Controlled By:  ReplicaSet/stable-diffusion4-predictor-default-00001-deployment-ddbc59b7f
Init Containers:
  storage-initializer:
    Container ID:  docker://038a69eba8b770c65ecc561f09e213754487a0347fef02b2b9d622fbf6238170
    Image:         coreweave/kfserving:storage-initializer-0.6.0
    Image ID:      docker-pullable://coreweave/kfserving@sha256:20653ef6230c6f651c27f69fb775ce32e7bbf4058680b43c42f34b4b453e551d
    Port:          <none>
    Host Port:     <none>
    Args:
      /mnt/pvc/
      /mnt/models
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 15 Jan 2023 12:54:03 +0000
      Finished:     Sun, 15 Jan 2023 12:54:04 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        1
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /mnt/models from kfserving-provision-location (rw)
      /mnt/pvc from kfserving-pvc-source (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
Containers:
  kfserving-container:
    Container ID:   docker://83e5d4cc2ceca0906e3680e00481e70846901ee5faf846f983f074ad443218e1
    Image:          index.docker.io/pkurzend/stable-diffusion-inference-test@sha256:93c3619bc5d95741d8cf289d96d7b6f16adae739d2e1d56d65b9af291879872b
    Image ID:       docker-pullable://pkurzend/stable-diffusion-inference-test@sha256:93c3619bc5d95741d8cf289d96d7b6f16adae739d2e1d56d65b9af291879872b
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 15 Jan 2023 12:55:04 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:             6
      memory:          32Gi
      nvidia.com/gpu:  1
    Requests:
      cpu:             6
      memory:          32Gi
      nvidia.com/gpu:  1
    Environment:
      STORAGE_URI:      /mnt/models
      HF_HOME:          /mnt/models/
      PORT:             8080
      K_REVISION:       stable-diffusion4-predictor-default-00001
      K_CONFIGURATION:  stable-diffusion4-predictor-default
      K_SERVICE:        stable-diffusion4-predictor-default
    Mounts:
      /mnt/models from kfserving-provision-location (ro)
      /mnt/pvc from kfserving-pvc-source (ro)
      /proc/cpuinfo from lxcfs-proc-cpuinfo (ro)
      /proc/diskstats from lxcfs-proc-diskstats (ro)
      /proc/loadavg from lxcfs-proc-loadavg (ro)
      /proc/meminfo from lxcfs-proc-meminfo (ro)
      /proc/stat from lxcfs-proc-stat (ro)
      /proc/swaps from lxcfs-proc-swaps (ro)
      /proc/uptime from lxcfs-proc-uptime (ro)
      /sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
  queue-proxy:
    Container ID:   docker://fc9aa3bcbc15e961ebdf9042afa9923c57722a0ac334c9fda1602ffacd96b760
    Image:          gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0
    Image ID:       docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0
    Ports:          8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Sun, 15 Jan 2023 12:55:05 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Readiness:  http-get http://:8012/ delay=0s timeout=1s period=1s #success=1 #failure=3
    Environment:
      SERVING_NAMESPACE:                 tenant-84e585-dev
      SERVING_SERVICE:                   stable-diffusion4-predictor-default
      SERVING_CONFIGURATION:             stable-diffusion4-predictor-default
      SERVING_REVISION:                  stable-diffusion4-predictor-default-00001
      QUEUE_SERVING_PORT:                8012
      CONTAINER_CONCURRENCY:             1
      REVISION_TIMEOUT_SECONDS:          300
      SERVING_POD:                       stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv (v1:metadata.name)
      SERVING_POD_IP:                     (v1:status.podIP)
      SERVING_LOGGING_CONFIG:
      SERVING_LOGGING_LEVEL:             info
      SERVING_REQUEST_LOG_TEMPLATE:      {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}
      SERVING_ENABLE_REQUEST_LOG:        false
      SERVING_REQUEST_METRICS_BACKEND:   prometheus
      TRACING_CONFIG_BACKEND:            none
      TRACING_CONFIG_ZIPKIN_ENDPOINT:
      TRACING_CONFIG_DEBUG:              false
      TRACING_CONFIG_SAMPLE_RATE:        0.1
      USER_PORT:                         8080
      SYSTEM_NAMESPACE:                  knative-serving
      METRICS_DOMAIN:                    knative.dev/internal/serving
      SERVING_READINESS_PROBE:           {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}
      ENABLE_PROFILING:                  false
      SERVING_ENABLE_PROBE_REQUEST_LOG:  false
      METRICS_COLLECTOR_ADDRESS:
      CONCURRENCY_STATE_ENDPOINT:
      CONCURRENCY_STATE_TOKEN_PATH:      /var/run/secrets/tokens/state-token
      HOST_IP:                            (v1:status.hostIP)
      ENABLE_HTTP2_AUTO_DETECTION:       false
    Mounts:
      /proc/cpuinfo from lxcfs-proc-cpuinfo (ro)
      /proc/diskstats from lxcfs-proc-diskstats (ro)
      /proc/loadavg from lxcfs-proc-loadavg (ro)
      /proc/meminfo from lxcfs-proc-meminfo (ro)
      /proc/stat from lxcfs-proc-stat (ro)
      /proc/swaps from lxcfs-proc-swaps (ro)
      /proc/uptime from lxcfs-proc-uptime (ro)
      /sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-8dnkk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8dnkk
    Optional:    false
  kfserving-pvc-source:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  stable-diffusion-model-cache
    ReadOnly:   false
  kfserving-provision-location:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  lxcfs-proc-cpuinfo:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/cpuinfo
    HostPathType:  File
  lxcfs-proc-diskstats:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/diskstats
    HostPathType:  File
  lxcfs-proc-meminfo:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/meminfo
    HostPathType:  File
  lxcfs-proc-stat:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/stat
    HostPathType:  File
  lxcfs-proc-swaps:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/swaps
    HostPathType:  File
  lxcfs-proc-uptime:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/uptime
    HostPathType:  File
  lxcfs-proc-loadavg:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/loadavg
    HostPathType:  File
  lxcfs-sys-devices-system-cpu-online:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/sys/devices/system/cpu/online
    HostPathType:  File
QoS Class:         Guaranteed
Node-Selectors:    node.coreweave.cloud/class=gpu
Tolerations:       is_gpu op=Exists
                   is_gpu_compute op=Exists
                   node.coreweave.cloud/reserved=b0a462e147a89e62c4282e915a9f13722b77e093:NoSchedule
                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From                       Message
  ----     ------                  ----                ----                       -------
  Normal   Scheduled               106s                prioritize-image-locality  Successfully assigned tenant-84e585-dev/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv to g74174e
  Normal   SuccessfulAttachVolume  106s                attachdetach-controller    AttachVolume.Attach succeeded for volume "pvc-6ff0ab13-fa5b-4241-a74f-830bb5b178c9"
  Normal   Pulled                  101s                kubelet                    Container image "coreweave/kfserving:storage-initializer-0.6.0" already present on machine
  Normal   Created                 101s                kubelet                    Created container storage-initializer
  Normal   Started                 100s                kubelet                    Started container storage-initializer
  Normal   Pulling                 99s                 kubelet                    Pulling image "index.docker.io/pkurzend/stable-diffusion-inference-test@sha256:93c3619bc5d95741d8cf289d96d7b6f16adae739d2e1d56d65b9af291879872b"
  Normal   Pulled                  41s                 kubelet                    Successfully pulled image "index.docker.io/pkurzend/stable-diffusion-inference-test@sha256:93c3619bc5d95741d8cf289d96d7b6f16adae739d2e1d56d65b9af291879872b" in 58.537072858s
  Normal   Created                 39s                 kubelet                    Created container kfserving-container
  Normal   Started                 39s                 kubelet                    Started container kfserving-container
  Normal   Pulled                  39s                 kubelet                    Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0" already present on machine
  Normal   Created                 39s                 kubelet                    Created container queue-proxy
  Normal   Started                 38s                 kubelet                    Started container queue-proxy
  Warning  Unhealthy               28s                 kubelet                    Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy               23s (x13 over 36s)  kubelet                    Readiness probe failed: Get "http://10.144.240.49:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

These are the latest events retrieved with kubectl events

2m38s                  Normal    FinalizerUpdate          Route/stable-diffusion4-predictor-default                                   Updated "stable-diffusion4-predictor-default" finalizers
2m38s                  Normal    Created                  Service/stable-diffusion4-predictor-default                                 Created Configuration "stable-diffusion4-predictor-default"
2m38s                  Normal    Created                  Service/stable-diffusion4-predictor-default                                 Created Route "stable-diffusion4-predictor-default"
2m38s                  Normal    Created                  Configuration/stable-diffusion4-predictor-default                           Created Revision "stable-diffusion4-predictor-default-00001"
2m37s                  Normal    ScalingReplicaSet        Deployment/stable-diffusion4-predictor-default-00001-deployment             Scaled up replica set stable-diffusion4-predictor-default-00001-deployment-ddbc59b7f to 1
2m37s                  Warning   InternalError            InferenceService/stable-diffusion4                                          fails to reconcile predictor: fails to update knative service: Operation cannot be fulfilled on services.serving.knative.dev "stable-diffusion4-predictor-default": the object has been modified; please apply your changes to the latest version and try again
2m36s                  Normal    SuccessfulAttachVolume   Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         AttachVolume.Attach succeeded for volume "pvc-6ff0ab13-fa5b-4241-a74f-830bb5b178c9"
2m36s                  Normal    SuccessfulCreate         ReplicaSet/stable-diffusion4-predictor-default-00001-deployment-ddbc59b7f   Created pod: stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv
2m36s                  Warning   InternalError            Revision/stable-diffusion4-predictor-default-00001                          failed to update deployment "stable-diffusion4-predictor-default-00001-deployment": Operation cannot be fulfilled on deployments.apps "stable-diffusion4-predictor-default-00001-deployment": the object has been modified; please apply your changes to the latest version and try again
2m36s                  Normal    Scheduled                Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Successfully assigned tenant-84e585-dev/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv to g74174e
2m31s                  Normal    Created                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Created container storage-initializer
2m31s                  Normal    Pulled                   Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Container image "coreweave/kfserving:storage-initializer-0.6.0" already present on machine
2m30s                  Normal    Started                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Started container storage-initializer
2m29s                  Normal    Pulling                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Pulling image "index.docker.io/pkurzend/stable-diffusion-inference-test@sha256:93c3619bc5d95741d8cf289d96d7b6f16adae739d2e1d56d65b9af291879872b"
91s                    Normal    Pulled                   Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Successfully pulled image "index.docker.io/pkurzend/stable-diffusion-inference-test@sha256:93c3619bc5d95741d8cf289d96d7b6f16adae739d2e1d56d65b9af291879872b" in 58.537072858s
89s                    Normal    Created                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Created container kfserving-container
89s                    Normal    Started                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Started container kfserving-container
89s                    Normal    Pulled                   Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0" already present on machine
89s                    Normal    Created                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Created container queue-proxy
88s                    Normal    Started                  Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Started container queue-proxy
78s                    Warning   Unhealthy                Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Readiness probe failed: HTTP probe failed with statuscode: 503
73s                    Normal    Created                  Ingress/stable-diffusion4-predictor-default                                 Created VirtualService "stable-diffusion4-predictor-default-mesh"
73s (x13 over 86s)     Warning   Unhealthy                Pod/stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv         Readiness probe failed: Get "http://10.144.240.49:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
73s                    Warning   InternalError            InferenceService/stable-diffusion4                                          fails to update InferenceService status: Operation cannot be fulfilled on inferenceservices.serving.kubeflow.org "stable-diffusion4": the object has been modified; please apply your changes to the latest version and try again
73s                    Warning   UpdateFailed             InferenceService/stable-diffusion4                                          Failed to update status for InferenceService "stable-diffusion4": Operation cannot be fulfilled on inferenceservices.serving.kubeflow.org "stable-diffusion4": the object has been modified; please apply your changes to the latest version and try again
73s                    Normal    RevisionReady            Revision/stable-diffusion4-predictor-default-00001                          Revision becomes ready upon all resources being ready
73s                    Normal    Created                  Ingress/stable-diffusion4-predictor-default                                 Created VirtualService "stable-diffusion4-predictor-default-ingress"
73s                    Normal    ConfigurationReady       Configuration/stable-diffusion4-predictor-default                           Configuration becomes ready
73s                    Normal    FinalizerUpdate          Ingress/stable-diffusion4-predictor-default                                 Updated "stable-diffusion4-predictor-default" finalizers
73s                    Normal    Created                  Route/stable-diffusion4-predictor-default                                   Created Ingress "stable-diffusion4-predictor-default"
73s                    Normal    Created                  Route/stable-diffusion4-predictor-default                                   Created placeholder service "stable-diffusion4-predictor-default"
73s                    Normal    LatestReadyUpdate        Configuration/stable-diffusion4-predictor-default                           LatestReadyRevisionName updated to "stable-diffusion4-predictor-default-00001"
60s                    Normal    InferenceServiceReady    InferenceService/stable-diffusion4                                          InferenceService [stable-diffusion4] is Ready

When i open the URL given from the kubectl get isvc command, i get a page not found 404 error. I tried the following urls:

https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/ https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v2/health/ready

following the documentation here https://kserve.github.io/website/modelserving/data_plane/ and here https://kserve.github.io/website/modelserving/inference_api/, i expected to get a response back

Can you guide me to the right direction what i am doing wrong here?

Thanks

Did you try to make a request like it's described in the documentation? Ie

curl http://stable-diffusion.tenant-example-example.knative.chi.coreweave.com/v1/models/stable-diffusion-v1-4:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography", "parameters": {"seed": 424242, "width": 768}}' --output sunset.png \ && open sunset.png

Hi, yes, i tried the following commands

curl https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png

curl https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion4:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png

i replaced stable-diffusion-v1-4 with the inference service name and also with the model name defined in service.py

Both yield empty png files

It looks like the code used in your deployment is not the original example in this repository. I recommend reverting the changes and seeing if the original example works.

Hello, yes i tried the original example as well, pretty much all outputs look the same: I run the original yaml file, i used the docker image specified in the original file (tweldoncw/stable-diffusion:7)

kubectl logs -l serving.kubeflow.org/inferenceservice=stable-diffusion --container kfserving-container

[I 230115 17:43:36 service:46] Model ID: CompVis/stable-diffusion-v1-4
[I 230115 17:43:36 service:47] Model Cache: /mnt/models/hub
[I 230115 17:43:36 service:69] Loading stable-diffusion-v1-4
[I 230115 17:43:56 service:84] Loaded stable-diffusion-v1-4
[I 230115 17:43:56 service:86] Loading stable-diffusion-v1-4 to accelerator
[I 230115 17:43:59 service:88] Accelerator loaded
[I 230115 17:43:59 model_server:150] Registering model: stable-diffusion-v1-4
[I 230115 17:43:59 model_server:123] Listening on port 8080
[I 230115 17:43:59 model_server:125] Will fork 1 workers
[I 230115 17:43:59 model_server:128] Setting max asyncio worker threads as 10

kubectl get isvc stable-diffusion

NAME               URL                                                                    READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                        AGE
stable-diffusion   https://stable-diffusion.tenant-84e585-dev.knative.chi.coreweave.com   True           100                              stable-diffusion-predictor-default-00001   87s

kubectl events

LAST SEEN               TYPE      REASON                   OBJECT                                                                         MESSAGE
104s                    Normal    FinalizerUpdate          Route/stable-diffusion-predictor-default                                       Updated "stable-diffusion-predictor-default" finalizers
104s                    Normal    Created                  Service/stable-diffusion-predictor-default                                     Created Configuration "stable-diffusion-predictor-default"
104s                    Normal    Created                  Service/stable-diffusion-predictor-default                                     Created Route "stable-diffusion-predictor-default"
104s                    Normal    Created                  Configuration/stable-diffusion-predictor-default                               Created Revision "stable-diffusion-predictor-default-00001"
103s                    Normal    SuccessfulCreate         ReplicaSet/stable-diffusion-predictor-default-00001-deployment-7c8c5f54f5      Created pod: stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n
103s                    Normal    ScalingReplicaSet        Deployment/stable-diffusion-predictor-default-00001-deployment                 Scaled up replica set stable-diffusion-predictor-default-00001-deployment-7c8c5f54f5 to 1
103s                    Normal    SuccessfulAttachVolume   Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            AttachVolume.Attach succeeded for volume "pvc-6ff0ab13-fa5b-4241-a74f-830bb5b178c9"
102s                    Normal    Scheduled                Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Successfully assigned tenant-84e585-dev/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n to g08faff
97s                     Normal    Pulled                   Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Container image "coreweave/kfserving:storage-initializer-0.6.0" already present on machine
97s                     Normal    Created                  Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Created container storage-initializer
97s                     Normal    Started                  Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Started container storage-initializer
96s                     Normal    Created                  Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Created container kfserving-container
96s                     Normal    Pulled                   Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Container image "index.docker.io/tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce" already present on machine
95s                     Normal    Started                  Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Started container kfserving-container
95s                     Normal    Started                  Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Started container queue-proxy
95s                     Normal    Pulled                   Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0" already present on machine
95s                     Normal    Created                  Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Created container queue-proxy
85s                     Warning   Unhealthy                Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Readiness probe failed: HTTP probe failed with statuscode: 503
78s (x15 over 93s)      Warning   Unhealthy                Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Readiness probe failed: Get "http://10.146.135.48:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
69s                     Normal    ConfigurationReady       Configuration/stable-diffusion-predictor-default                               Configuration becomes ready
69s (x3 over 104s)      Warning   InternalError            InferenceService/stable-diffusion                                              fails to reconcile predictor: fails to update knative service: Operation cannot be fulfilled on services.serving.knative.dev "stable-diffusion-predictor-default": the object has been modified; please apply your changes to the latest version and try again
69s                     Normal    Created                  Ingress/stable-diffusion-predictor-default                                     Created VirtualService "stable-diffusion-predictor-default-ingress"
69s                     Normal    Created                  Route/stable-diffusion-predictor-default                                       Created Ingress "stable-diffusion-predictor-default"
69s                     Normal    Created                  Route/stable-diffusion-predictor-default                                       Created placeholder service "stable-diffusion-predictor-default"
69s                     Normal    LatestReadyUpdate        Configuration/stable-diffusion-predictor-default                               LatestReadyRevisionName updated to "stable-diffusion-predictor-default-00001"
69s                     Normal    FinalizerUpdate          Ingress/stable-diffusion-predictor-default                                     Updated "stable-diffusion-predictor-default" finalizers
69s                     Normal    Created                  Ingress/stable-diffusion-predictor-default                                     Created VirtualService "stable-diffusion-predictor-default-mesh"
69s (x2 over 69s)       Normal    RevisionReady            Revision/stable-diffusion-predictor-default-00001                              Revision becomes ready upon all resources being ready
69s (x2 over 102s)      Warning   InternalError            Revision/stable-diffusion-predictor-default-00001                              failed to update deployment "stable-diffusion-predictor-default-00001-deployment": Operation cannot be fulfilled on deployments.apps "stable-diffusion-predictor-default-00001-deployment": the object has been modified; please apply your changes to the latest version and try again
56s                     Normal    InferenceServiceReady    InferenceService/stable-diffusion

kubectl describe pod stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n

Name:                 stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n
Namespace:            tenant-84e585-dev
Priority:             1000000
Priority Class Name:  normal
Service Account:      default
Node:                 g08faff/10.135.34.2
Start Time:           Sun, 15 Jan 2023 17:43:27 +0000
Labels:               app=stable-diffusion-predictor-default-00001
                      component=predictor
                      pod-template-hash=7c8c5f54f5
                      service.istio.io/canonical-name=stable-diffusion-predictor-default
                      service.istio.io/canonical-revision=stable-diffusion-predictor-default-00001
                      serving.knative.dev/configuration=stable-diffusion-predictor-default
                      serving.knative.dev/configurationGeneration=1
                      serving.knative.dev/configurationUID=bbcf855b-11ae-40c2-9e00-aa2523f64798
                      serving.knative.dev/revision=stable-diffusion-predictor-default-00001
                      serving.knative.dev/revisionUID=8adf3e86-605f-4a25-b590-172eca067c9a
                      serving.knative.dev/service=stable-diffusion-predictor-default
                      serving.knative.dev/serviceUID=a695618f-e31c-4925-b26b-daeb447cac96
                      serving.kubeflow.org/inferenceservice=stable-diffusion
Annotations:          autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
                      autoscaling.knative.dev/maxScale: 1
                      autoscaling.knative.dev/minScale: 1
                      cni.projectcalico.org/containerID: 62a9c5def9c33d7c3a54e477317611830850be6a51b88392f0d462063ca29efe
                      cni.projectcalico.org/podIP: 10.146.135.48/32
                      cni.projectcalico.org/podIPs: 10.146.135.48/32
                      container.apparmor.security.beta.kubernetes.io/kfserving-container: runtime/default
                      container.apparmor.security.beta.kubernetes.io/queue-proxy: runtime/default
                      container.apparmor.security.beta.kubernetes.io/storage-initializer: runtime/default
                      internal.serving.kubeflow.org/storage-initializer-sourceuri: pvc://stable-diffusion-model-cache/
                      kubernetes.io/psp: restricted
                      lxcfs-admission-webhook.aliyun.com/status: mutated
                      seccomp.security.alpha.kubernetes.io/pod: docker/default
                      serving.coreweave.cloud/static: true
                      serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
Status:               Running
IP:                   10.146.135.48
IPs:
  IP:           10.146.135.48
Controlled By:  ReplicaSet/stable-diffusion-predictor-default-00001-deployment-7c8c5f54f5
Init Containers:
  storage-initializer:
    Container ID:  docker://b836a139f570313331f94953e20a89d6df09d3531b846dd2002cbb5ea0e8ef19
    Image:         coreweave/kfserving:storage-initializer-0.6.0
    Image ID:      docker-pullable://coreweave/kfserving@sha256:20653ef6230c6f651c27f69fb775ce32e7bbf4058680b43c42f34b4b453e551d
    Port:          <none>
    Host Port:     <none>
    Args:
      /mnt/pvc/
      /mnt/models
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 15 Jan 2023 17:43:32 +0000
      Finished:     Sun, 15 Jan 2023 17:43:33 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        1
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /mnt/models from kfserving-provision-location (rw)
      /mnt/pvc from kfserving-pvc-source (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
Containers:
  kfserving-container:
    Container ID:   docker://b7d4d15e03eab65198c8b45be68b32df654ae8039d5cd3d93d99e4cf0ee210ed
    Image:          index.docker.io/tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce
    Image ID:       docker-pullable://tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 15 Jan 2023 17:43:34 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:             6
      memory:          32Gi
      nvidia.com/gpu:  1
    Requests:
      cpu:             6
      memory:          32Gi
      nvidia.com/gpu:  1
    Environment:
      HUGGING_FACE_HUB_TOKEN:  <set to the key 'token' in secret 'huggingface-hub-token'>  Optional: false
      STORAGE_URI:             /mnt/models
      PORT:                    8080
      K_REVISION:              stable-diffusion-predictor-default-00001
      K_CONFIGURATION:         stable-diffusion-predictor-default
      K_SERVICE:               stable-diffusion-predictor-default
    Mounts:
      /mnt/models from kfserving-provision-location (ro)
      /mnt/pvc from kfserving-pvc-source (ro)
      /proc/cpuinfo from lxcfs-proc-cpuinfo (ro)
      /proc/diskstats from lxcfs-proc-diskstats (ro)
      /proc/loadavg from lxcfs-proc-loadavg (ro)
      /proc/meminfo from lxcfs-proc-meminfo (ro)
      /proc/stat from lxcfs-proc-stat (ro)
      /proc/swaps from lxcfs-proc-swaps (ro)
      /proc/uptime from lxcfs-proc-uptime (ro)
      /sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
  queue-proxy:
    Container ID:   docker://fd0a3d8b9733974dc8daf217fa667b811fdbc086087cf27baeadf1aea45accad
    Image:          gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0
    Image ID:       docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0
    Ports:          8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Sun, 15 Jan 2023 17:43:34 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Readiness:  http-get http://:8012/ delay=0s timeout=1s period=1s #success=1 #failure=3
    Environment:
      SERVING_NAMESPACE:                 tenant-84e585-dev
      SERVING_SERVICE:                   stable-diffusion-predictor-default
      SERVING_CONFIGURATION:             stable-diffusion-predictor-default
      SERVING_REVISION:                  stable-diffusion-predictor-default-00001
      QUEUE_SERVING_PORT:                8012
      CONTAINER_CONCURRENCY:             1
      REVISION_TIMEOUT_SECONDS:          300
      SERVING_POD:                       stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n (v1:metadata.name)
      SERVING_POD_IP:                     (v1:status.podIP)
      SERVING_LOGGING_CONFIG:
      SERVING_LOGGING_LEVEL:             info
      SERVING_REQUEST_LOG_TEMPLATE:      {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}
      SERVING_ENABLE_REQUEST_LOG:        false
      SERVING_REQUEST_METRICS_BACKEND:   prometheus
      TRACING_CONFIG_BACKEND:            none
      TRACING_CONFIG_ZIPKIN_ENDPOINT:
      TRACING_CONFIG_DEBUG:              false
      TRACING_CONFIG_SAMPLE_RATE:        0.1
      USER_PORT:                         8080
      SYSTEM_NAMESPACE:                  knative-serving
      METRICS_DOMAIN:                    knative.dev/internal/serving
      SERVING_READINESS_PROBE:           {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}
      ENABLE_PROFILING:                  false
      SERVING_ENABLE_PROBE_REQUEST_LOG:  false
      METRICS_COLLECTOR_ADDRESS:
      CONCURRENCY_STATE_ENDPOINT:
      CONCURRENCY_STATE_TOKEN_PATH:      /var/run/secrets/tokens/state-token
      HOST_IP:                            (v1:status.hostIP)
      ENABLE_HTTP2_AUTO_DETECTION:       false
    Mounts:
      /proc/cpuinfo from lxcfs-proc-cpuinfo (ro)
      /proc/diskstats from lxcfs-proc-diskstats (ro)
      /proc/loadavg from lxcfs-proc-loadavg (ro)
      /proc/meminfo from lxcfs-proc-meminfo (ro)
      /proc/stat from lxcfs-proc-stat (ro)
      /proc/swaps from lxcfs-proc-swaps (ro)
      /proc/uptime from lxcfs-proc-uptime (ro)
      /sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-8dnkk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8dnkk
    Optional:    false
  kfserving-pvc-source:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  stable-diffusion-model-cache
    ReadOnly:   false
  kfserving-provision-location:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  lxcfs-proc-cpuinfo:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/cpuinfo
    HostPathType:  File
  lxcfs-proc-diskstats:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/diskstats
    HostPathType:  File
  lxcfs-proc-meminfo:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/meminfo
    HostPathType:  File
  lxcfs-proc-stat:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/stat
    HostPathType:  File
  lxcfs-proc-swaps:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/swaps
    HostPathType:  File
  lxcfs-proc-uptime:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/uptime
    HostPathType:  File
  lxcfs-proc-loadavg:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/proc/loadavg
    HostPathType:  File
  lxcfs-sys-devices-system-cpu-online:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/lxcfs/sys/devices/system/cpu/online
    HostPathType:  File
QoS Class:         Guaranteed
Node-Selectors:    node.coreweave.cloud/class=gpu
Tolerations:       is_gpu op=Exists
                   is_gpu_compute op=Exists
                   node.coreweave.cloud/reserved=b0a462e147a89e62c4282e915a9f13722b77e093:NoSchedule
                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                     From                       Message
  ----     ------                  ----                    ----                       -------
  Normal   Scheduled               4m8s                    prioritize-image-locality  Successfully assigned tenant-84e585-dev/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n to g08faff
  Normal   SuccessfulAttachVolume  4m8s                    attachdetach-controller    AttachVolume.Attach succeeded for volume "pvc-6ff0ab13-fa5b-4241-a74f-830bb5b178c9"
  Normal   Pulled                  4m2s                    kubelet                    Container image "coreweave/kfserving:storage-initializer-0.6.0" already present on machine
  Normal   Created                 4m2s                    kubelet                    Created container storage-initializer
  Normal   Started                 4m2s                    kubelet                    Started container storage-initializer
  Normal   Created                 4m1s                    kubelet                    Created container kfserving-container
  Normal   Pulled                  4m1s                    kubelet                    Container image "index.docker.io/tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce" already present on machine
  Normal   Started                 4m                      kubelet                    Started container kfserving-container
  Normal   Pulled                  4m                      kubelet                    Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0" already present on machine
  Normal   Created                 4m                      kubelet                    Created container queue-proxy
  Normal   Started                 4m                      kubelet                    Started container queue-proxy
  Warning  Unhealthy               3m50s                   kubelet                    Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy               3m43s (x15 over 3m58s)  kubelet                    Readiness probe failed: Get "http://10.146.135.48:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

calling the giben endpoint with curl returns again an empty file and 404 error

curl https://stable-diffusion.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-v1-4:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png

the kubectl events above contained the following warnings

85s                     Warning   Unhealthy                Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Readiness probe failed: HTTP probe failed with statuscode: 503
78s (x15 over 93s)      Warning   Unhealthy                Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n            Readiness probe failed: Get "http://10.146.135.48:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
69s (x3 over 104s)      Warning   InternalError            InferenceService/stable-diffusion                                              fails to reconcile predictor: fails to update knative service: Operation cannot be fulfilled on services.serving.knative.dev "stable-diffusion-predictor-default": the object has been modified; please apply your changes to the latest version and try again
69s (x2 over 102s)      Warning   InternalError            Revision/stable-diffusion-predictor-default-00001                              failed to update deployment "stable-diffusion-predictor-default-00001-deployment": Operation cannot be fulfilled on deployments.apps "stable-diffusion-predictor-default-00001-deployment": the object has been modified; please apply your changes to the latest version and try again

I dont know what they mean, but maybe they are the reason

I also made a hello world example with this yaml file:

apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: hello-world
spec:
  predictor:
    containerConcurrency: 1
    minReplicas: 1
    maxReplicas: 1 
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: gpu.nvidia.com/class
              operator: In
              values:
              - A40
            - key: topology.kubernetes.io/region
              operator: In
              values:
              - ORD1 
    containers:
      - name: kserve-container
        image: pkurzend/hello-world:1
        env:
          - name: STORAGE_URI # Kserve mounts the PVC at /mnt/models/
            value: pvc://stable-diffusion-model-cache/
            # The following env vars are the default model parameters, which can be changed as needed. 
        resources:
          requests:
            cpu: 6
            memory: 32Gi
            nvidia.com/gpu: 1
          limits:
            cpu: 6
            memory: 32Gi
            nvidia.com/gpu: 1

and this service.py

import kserve
import logging
import os
from typing import Dict
from argparse import ArgumentParser
from io import BytesIO

MODEL_NAME = 'hello-world'

logging.basicConfig(level=kserve.constants.KSERVE_LOGLEVEL)
logger = logging.getLogger(MODEL_NAME)
logger.info(f"Model Name: {MODEL_NAME}")

class Model(kserve.Model):
    def __init__(self, name: str):
        super().__init__(name)
        self.name = name
        self.ready = False

    def load(self):
        logger.info(f"Loading {MODEL_NAME}")
        self.ready = True

    def predict(self, request: Dict) -> Dict:

        return {"hello" : "world"}

if __name__ == "__main__":
    model = Model(name=MODEL_NAME)
    model.load()
    kserve.ModelServer().start([model])

Same problem with the hello world example.

i tried all the urls returned by kubectl describe isvc <name>

kubectl describe isvc hello-world

http://hello-world.tenant-84e585-dev.svc.tenant.chi.local/v1/models/hello-world:predict 404

http://hello-world-predictor-default.tenant-84e585-dev.svc.tenant.chi.local 404

https://hello-world-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com {"status": "alive"}

https://hello-world.tenant-84e585-dev.knative.chi.coreweave.com 404

concat urls https://hello-world-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/hello-world:predict {"error": null}

kubectl describe isvc stable-diffusion5

http://stable-diffusion5.tenant-84e585-dev.svc.tenant.chi.local/v1/models/stable-diffusion5:predict 404

http://stable-diffusion5-predictor-default.tenant-84e585-dev.svc.tenant.chi.local 404

https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com {"status": "alive"}

https://stable-diffusion5.tenant-84e585-dev.knative.chi.coreweave.com 404

concat urls https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion5:predict {"error": null} https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict {"error": null}

https://stable-diffusion5.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict 404

okay this command worked now

curl https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png

maybe you can update the guide with the url,

https://<inferenceservice-name>-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/<model-name>:predict

Hi, i cant get the scale down to zeroto work like in the tutorial. In my yaml file i specified minReplicas : 0

  predictor:
    containerConcurrency: 1
    minReplicas: 0
    maxReplicas: 1

But after 35 minutes the pod is still running:

kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
stable-diffusion6-predictor-default-00001-deployment-868c6xj88d   2/2     Running   0          35m

How can i change the time after which pods are scaled down?

What do you currently have the autoscaling.knative.dev/scale-to-zero-pod-retention-period field set to in your manifest? https://docs.coreweave.com/compass/online-inference#scale-to-zero

The top of my yaml file looks like this now:

apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: stable-diffusion6
  annotations:
        autoscaling.knative.dev/scale-to-zero-pod-retention-period: "1m5s"

But down scaling doesnt happen after 1minute 5sec

kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
stable-diffusion6-predictor-default-00001-deployment-6dc97q99tv   2/2     Running   0          7m6s

After ca 8 min (ca 6-7 min idle time), the pod is terminating

kubectl get pods
NAME                                                              READY   STATUS        RESTARTS   AGE
stable-diffusion6-predictor-default-00001-deployment-6dc97q99tv   0/2     Terminating   0          8m6s
virt-launcher-stabel-diffusion-mqbbg                              1/1     Running       0          12m

At what time did you do the last request? This doesn't look too bad to me.

the pod started when the request came in, so the last request started at age=0s. 6-7 min retention period is fine for me.

One more question though:

i am trying to deploy a knative service with mounted pvc.

this is my yaml:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: train-service-test
spec:
 template:
   spec:
     containers:
      - name: training-service-test
        image: pkurzend/training-service-test:1
        imagePullPolicy: IfNotPresent
        env:
        - name: EXAMPLE
          value: "Python Sample v1"
        volumeMounts:
          - name: model-cache
            mountPath: /mnt/models
     volumes:
       - name: model-cache
         persistentVolumeClaim:
           claimName: stable-diffusion-model-cache

When applying this error i keep getting this error

Error from server (BadRequest): error when creating "train-service.yaml": admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: expected exactly one, got neither: spec.template.spec.volumes[0].configMap, spec.template.spec.volumes[0].emptyDir, spec.template.spec.volumes[0].projected, spec.template.spec.volumes[0].secret

So i think i have to enable some feature flags like documented here: https://knative.dev/docs/serving/configuration/feature-flags/#kubernetes-persistentvolumeclaim-pvc

As i understand i am supposed to define some configMap with those keys and the value "enabled" So i applied this yaml file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-features
data:
    kubernetes.podspec-persistent-volume-claim: "enabled"
    kubernetes.podspec-persistent-volume-write: "enabled"

In the documentation it says, it should be done using the knative-service namespace, but i dont have access to it so i leave it blank.

But i keep getting the same error. Do you have any directions on how i can achieve to mount the pvc?

Thanks

You will have to use an InferenceService to get PVC support currently. That's how our examples do it. With that said, you will get better performance by loading your models from object storage using our Tensorizer library into GPU memory. We are in the process of updating our SD examples to use the library.

I already have an inferenceService service running. But besides, i need an extra service to store some config in the pvc and then trigger a training job. The inferenceService only gives read access to the pvc, hence i am trying to mount the pvc.

Writing to a PVC from an Inference Service is usually a big no-no, since they should be immutable and scale up/down as needed and if you write to the wrong place you'll be dealing with race conditions. Writing to object storage (if it's an object size set of data) or to something like REDIS (if it's more of a message queue type data) is a better pattern.

okay thank you, i will look into this

Hi, i got the redis to work, thank, I get permission errors when creating a job with the kubernetes python client from within the pod.

this is my code to create a job

from kubernetes import client, config, utils

@app.route('/start-training-job', methods = ['POST'])
def start_train_job():
    ...
    config.load_incluster_config()
    k8s_client = client.ApiClient()

    job_configuration = {
                        "apiVersion": "batch/v1",
                        "kind": "Job",
                        "metadata": {
                            "name": "training-job"
                        },
                        "spec": {
                            "template": {
                            "spec": {
                                "containers": [
                                {
                                    "name": "model-trainer",
                                    "image": "pkurzend/training-job:2",
                                    "imagePullPolicy": "IfNotPresent",
                                    "command": [
                                    "python3",
                                    "./train.py",
                                    "--user_id=philip",
                                    "--model_name=test-model"
                                    ],
                                    "env": [
                                    {
                                        "name": "HF_HOME",
                                        "value": "/mnt/models/"
                                    }
                                    ],
                                    "volumeMounts": [
                                    {
                                        "name": "model-cache",
                                        "mountPath": "/mnt/models"
                                    }
                                    ],
                                    "resources": {
                                    "requests": {
                                        "cpu": 6,
                                        "memory": "32Gi",
                                        "nvidia.com/gpu": 1
                                    },
                                    "limits": {
                                        "cpu": 6,
                                        "memory": "32Gi",
                                        "nvidia.com/gpu": 1
                                    }
                                    }
                                }
                                ],
                                "volumes": [
                                {
                                    "name": "model-cache",
                                    "persistentVolumeClaim": {
                                    "claimName": "stable-diffusion-model-cache"
                                    }
                                }
                                ],
                                "affinity": {
                                "nodeAffinity": {
                                    "requiredDuringSchedulingIgnoredDuringExecution": {
                                    "nodeSelectorTerms": [
                                        {
                                        "matchExpressions": [
                                            {
                                            "key": "gpu.nvidia.com/class",
                                            "operator": "In",
                                            "values": [
                                                "RTX_A5000"
                                            ]
                                            },
                                            {
                                            "key": "topology.kubernetes.io/region",
                                            "operator": "In",
                                            "values": [
                                                "ORD1"
                                            ]
                                            }
                                        ]
                                        }
                                    ]
                                    }
                                }
                                },
                                "restartPolicy": "Never"
                            }
                            },
                            "backoffLimit": 2
                        }
                        }
    utils.create_from_dict(k8s_client, job_configuration)
    ...

I get a permission error

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 2548, in __call__
    return self.wsgi_app(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.handle_exception(e)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/app.py", line 124, in start_train_job
    utils.create_from_dict(k8s_client, job_configuration)
  File "/usr/local/lib/python3.11/site-packages/kubernetes/utils/create_from_yaml.py", line 224, in create_from_dict
    raise FailToCreateError(api_exceptions)
kubernetes.utils.create_from_yaml.FailToCreateError: Error from server (Forbidden): {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:tenant-84e585-dev:default\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}

According to the kubernetes documentation https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/ the config.load_incluster_config() method should be used to access the kubernetes api from within the pod. Can you guide me to the right direction to why i am getting forbidden? Is this specific to the coreweave cluster?

It looks like your code is attempting to create a job in the "default" namespace. You'll need to specify your own namespace as `tenant-84e585-dev', and then this should work.

As @ChachiTheGhost says, you need to specify the right namespace. It is also likely that you will need to create a Role and RoleBinding to give the default service account more permissions, such as Job create.

i am trying to create a Role and then a RoleBinding, but i get permission errors

here is my yaml file for the role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: tenant-84e585-dev
  name: myrole
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["jobs", "pods"]
  verbs: ["get","list","create","delete"]

This is the permission error i keep getting

Error from server (Forbidden): error when creating "role.yaml": roles.rbac.authorization.k8s.io "myrole" is forbidden: user "token-mMh6r3pcvU6MXdeq3WcC" (groups=["r:cloud-app:base" "w:ns-tenant-84e585-dev:storage" "w:ns-tenant-84e585-dev:objectstorage" "r:ns-tenant-84e585-dev:pods" "r:ns-tenant-84e585-dev:objectstorage" "w:ns-tenant-84e585-dev:virtualservers" "w:ns-tenant-84e585-dev:base" "r:ns-tenant-84e585-dev:virtualservers" "r:ns-tenant-84e585-dev:base" "r:ns-tenant-84e585-dev:full" "r:ns-tenant-84e585-dev:storage" "w:ns-tenant-84e585-dev:full" "w:ns-tenant-84e585-dev:pods" "system:authenticated"]) is attempting to grant RBAC permissions not currently held:
{APIGroups:[""], Resources:["jobs"], Verbs:["get" "list" "create" "delete"]}; resolution errors: [roles.rbac.authorization.k8s.io "myrole" not found]

Jobs is in apigroup batch.

- apiGroups:
  - batch
  - extensions
  resources:
  - jobs
  - jobs/status
  - cronjobs
   verbs:
  - get
  - list
  - watch
  - create
  - delete

coreweave / kubernetes-cloud

inference service #125