Open pkurzend opened 1 year ago
Did you try to make a request like it's described in the documentation? Ie
curl http://stable-diffusion.tenant-example-example.knative.chi.coreweave.com/v1/models/stable-diffusion-v1-4:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography", "parameters": {"seed": 424242, "width": 768}}' --output sunset.png \ && open sunset.png
Hi, yes, i tried the following commands
curl https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png
curl https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion4:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png
i replaced stable-diffusion-v1-4
with the inference service name and also with the model name defined in service.py
Both yield empty png files
It looks like the code used in your deployment is not the original example in this repository. I recommend reverting the changes and seeing if the original example works.
Hello, yes i tried the original example as well, pretty much all outputs look the same:
I run the original yaml file, i used the docker image specified in the original file (tweldoncw/stable-diffusion:7
)
kubectl logs -l serving.kubeflow.org/inferenceservice=stable-diffusion --container kfserving-container
[I 230115 17:43:36 service:46] Model ID: CompVis/stable-diffusion-v1-4
[I 230115 17:43:36 service:47] Model Cache: /mnt/models/hub
[I 230115 17:43:36 service:69] Loading stable-diffusion-v1-4
[I 230115 17:43:56 service:84] Loaded stable-diffusion-v1-4
[I 230115 17:43:56 service:86] Loading stable-diffusion-v1-4 to accelerator
[I 230115 17:43:59 service:88] Accelerator loaded
[I 230115 17:43:59 model_server:150] Registering model: stable-diffusion-v1-4
[I 230115 17:43:59 model_server:123] Listening on port 8080
[I 230115 17:43:59 model_server:125] Will fork 1 workers
[I 230115 17:43:59 model_server:128] Setting max asyncio worker threads as 10
kubectl get isvc stable-diffusion
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
stable-diffusion https://stable-diffusion.tenant-84e585-dev.knative.chi.coreweave.com True 100 stable-diffusion-predictor-default-00001 87s
kubectl events
LAST SEEN TYPE REASON OBJECT MESSAGE
104s Normal FinalizerUpdate Route/stable-diffusion-predictor-default Updated "stable-diffusion-predictor-default" finalizers
104s Normal Created Service/stable-diffusion-predictor-default Created Configuration "stable-diffusion-predictor-default"
104s Normal Created Service/stable-diffusion-predictor-default Created Route "stable-diffusion-predictor-default"
104s Normal Created Configuration/stable-diffusion-predictor-default Created Revision "stable-diffusion-predictor-default-00001"
103s Normal SuccessfulCreate ReplicaSet/stable-diffusion-predictor-default-00001-deployment-7c8c5f54f5 Created pod: stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n
103s Normal ScalingReplicaSet Deployment/stable-diffusion-predictor-default-00001-deployment Scaled up replica set stable-diffusion-predictor-default-00001-deployment-7c8c5f54f5 to 1
103s Normal SuccessfulAttachVolume Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n AttachVolume.Attach succeeded for volume "pvc-6ff0ab13-fa5b-4241-a74f-830bb5b178c9"
102s Normal Scheduled Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Successfully assigned tenant-84e585-dev/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n to g08faff
97s Normal Pulled Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Container image "coreweave/kfserving:storage-initializer-0.6.0" already present on machine
97s Normal Created Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Created container storage-initializer
97s Normal Started Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Started container storage-initializer
96s Normal Created Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Created container kfserving-container
96s Normal Pulled Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Container image "index.docker.io/tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce" already present on machine
95s Normal Started Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Started container kfserving-container
95s Normal Started Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Started container queue-proxy
95s Normal Pulled Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0" already present on machine
95s Normal Created Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Created container queue-proxy
85s Warning Unhealthy Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Readiness probe failed: HTTP probe failed with statuscode: 503
78s (x15 over 93s) Warning Unhealthy Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Readiness probe failed: Get "http://10.146.135.48:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
69s Normal ConfigurationReady Configuration/stable-diffusion-predictor-default Configuration becomes ready
69s (x3 over 104s) Warning InternalError InferenceService/stable-diffusion fails to reconcile predictor: fails to update knative service: Operation cannot be fulfilled on services.serving.knative.dev "stable-diffusion-predictor-default": the object has been modified; please apply your changes to the latest version and try again
69s Normal Created Ingress/stable-diffusion-predictor-default Created VirtualService "stable-diffusion-predictor-default-ingress"
69s Normal Created Route/stable-diffusion-predictor-default Created Ingress "stable-diffusion-predictor-default"
69s Normal Created Route/stable-diffusion-predictor-default Created placeholder service "stable-diffusion-predictor-default"
69s Normal LatestReadyUpdate Configuration/stable-diffusion-predictor-default LatestReadyRevisionName updated to "stable-diffusion-predictor-default-00001"
69s Normal FinalizerUpdate Ingress/stable-diffusion-predictor-default Updated "stable-diffusion-predictor-default" finalizers
69s Normal Created Ingress/stable-diffusion-predictor-default Created VirtualService "stable-diffusion-predictor-default-mesh"
69s (x2 over 69s) Normal RevisionReady Revision/stable-diffusion-predictor-default-00001 Revision becomes ready upon all resources being ready
69s (x2 over 102s) Warning InternalError Revision/stable-diffusion-predictor-default-00001 failed to update deployment "stable-diffusion-predictor-default-00001-deployment": Operation cannot be fulfilled on deployments.apps "stable-diffusion-predictor-default-00001-deployment": the object has been modified; please apply your changes to the latest version and try again
56s Normal InferenceServiceReady InferenceService/stable-diffusion
kubectl describe pod stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n
Name: stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n
Namespace: tenant-84e585-dev
Priority: 1000000
Priority Class Name: normal
Service Account: default
Node: g08faff/10.135.34.2
Start Time: Sun, 15 Jan 2023 17:43:27 +0000
Labels: app=stable-diffusion-predictor-default-00001
component=predictor
pod-template-hash=7c8c5f54f5
service.istio.io/canonical-name=stable-diffusion-predictor-default
service.istio.io/canonical-revision=stable-diffusion-predictor-default-00001
serving.knative.dev/configuration=stable-diffusion-predictor-default
serving.knative.dev/configurationGeneration=1
serving.knative.dev/configurationUID=bbcf855b-11ae-40c2-9e00-aa2523f64798
serving.knative.dev/revision=stable-diffusion-predictor-default-00001
serving.knative.dev/revisionUID=8adf3e86-605f-4a25-b590-172eca067c9a
serving.knative.dev/service=stable-diffusion-predictor-default
serving.knative.dev/serviceUID=a695618f-e31c-4925-b26b-daeb447cac96
serving.kubeflow.org/inferenceservice=stable-diffusion
Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/maxScale: 1
autoscaling.knative.dev/minScale: 1
cni.projectcalico.org/containerID: 62a9c5def9c33d7c3a54e477317611830850be6a51b88392f0d462063ca29efe
cni.projectcalico.org/podIP: 10.146.135.48/32
cni.projectcalico.org/podIPs: 10.146.135.48/32
container.apparmor.security.beta.kubernetes.io/kfserving-container: runtime/default
container.apparmor.security.beta.kubernetes.io/queue-proxy: runtime/default
container.apparmor.security.beta.kubernetes.io/storage-initializer: runtime/default
internal.serving.kubeflow.org/storage-initializer-sourceuri: pvc://stable-diffusion-model-cache/
kubernetes.io/psp: restricted
lxcfs-admission-webhook.aliyun.com/status: mutated
seccomp.security.alpha.kubernetes.io/pod: docker/default
serving.coreweave.cloud/static: true
serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
Status: Running
IP: 10.146.135.48
IPs:
IP: 10.146.135.48
Controlled By: ReplicaSet/stable-diffusion-predictor-default-00001-deployment-7c8c5f54f5
Init Containers:
storage-initializer:
Container ID: docker://b836a139f570313331f94953e20a89d6df09d3531b846dd2002cbb5ea0e8ef19
Image: coreweave/kfserving:storage-initializer-0.6.0
Image ID: docker-pullable://coreweave/kfserving@sha256:20653ef6230c6f651c27f69fb775ce32e7bbf4058680b43c42f34b4b453e551d
Port: <none>
Host Port: <none>
Args:
/mnt/pvc/
/mnt/models
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 15 Jan 2023 17:43:32 +0000
Finished: Sun, 15 Jan 2023 17:43:33 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 1
memory: 1Gi
Environment: <none>
Mounts:
/mnt/models from kfserving-provision-location (rw)
/mnt/pvc from kfserving-pvc-source (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
Containers:
kfserving-container:
Container ID: docker://b7d4d15e03eab65198c8b45be68b32df654ae8039d5cd3d93d99e4cf0ee210ed
Image: index.docker.io/tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce
Image ID: docker-pullable://tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 15 Jan 2023 17:43:34 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 6
memory: 32Gi
nvidia.com/gpu: 1
Requests:
cpu: 6
memory: 32Gi
nvidia.com/gpu: 1
Environment:
HUGGING_FACE_HUB_TOKEN: <set to the key 'token' in secret 'huggingface-hub-token'> Optional: false
STORAGE_URI: /mnt/models
PORT: 8080
K_REVISION: stable-diffusion-predictor-default-00001
K_CONFIGURATION: stable-diffusion-predictor-default
K_SERVICE: stable-diffusion-predictor-default
Mounts:
/mnt/models from kfserving-provision-location (ro)
/mnt/pvc from kfserving-pvc-source (ro)
/proc/cpuinfo from lxcfs-proc-cpuinfo (ro)
/proc/diskstats from lxcfs-proc-diskstats (ro)
/proc/loadavg from lxcfs-proc-loadavg (ro)
/proc/meminfo from lxcfs-proc-meminfo (ro)
/proc/stat from lxcfs-proc-stat (ro)
/proc/swaps from lxcfs-proc-swaps (ro)
/proc/uptime from lxcfs-proc-uptime (ro)
/sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
queue-proxy:
Container ID: docker://fd0a3d8b9733974dc8daf217fa667b811fdbc086087cf27baeadf1aea45accad
Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0
Image ID: docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0
Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Sun, 15 Jan 2023 17:43:34 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 1
memory: 1Gi
Readiness: http-get http://:8012/ delay=0s timeout=1s period=1s #success=1 #failure=3
Environment:
SERVING_NAMESPACE: tenant-84e585-dev
SERVING_SERVICE: stable-diffusion-predictor-default
SERVING_CONFIGURATION: stable-diffusion-predictor-default
SERVING_REVISION: stable-diffusion-predictor-default-00001
QUEUE_SERVING_PORT: 8012
CONTAINER_CONCURRENCY: 1
REVISION_TIMEOUT_SECONDS: 300
SERVING_POD: stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n (v1:metadata.name)
SERVING_POD_IP: (v1:status.podIP)
SERVING_LOGGING_CONFIG:
SERVING_LOGGING_LEVEL: info
SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}
SERVING_ENABLE_REQUEST_LOG: false
SERVING_REQUEST_METRICS_BACKEND: prometheus
TRACING_CONFIG_BACKEND: none
TRACING_CONFIG_ZIPKIN_ENDPOINT:
TRACING_CONFIG_DEBUG: false
TRACING_CONFIG_SAMPLE_RATE: 0.1
USER_PORT: 8080
SYSTEM_NAMESPACE: knative-serving
METRICS_DOMAIN: knative.dev/internal/serving
SERVING_READINESS_PROBE: {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}
ENABLE_PROFILING: false
SERVING_ENABLE_PROBE_REQUEST_LOG: false
METRICS_COLLECTOR_ADDRESS:
CONCURRENCY_STATE_ENDPOINT:
CONCURRENCY_STATE_TOKEN_PATH: /var/run/secrets/tokens/state-token
HOST_IP: (v1:status.hostIP)
ENABLE_HTTP2_AUTO_DETECTION: false
Mounts:
/proc/cpuinfo from lxcfs-proc-cpuinfo (ro)
/proc/diskstats from lxcfs-proc-diskstats (ro)
/proc/loadavg from lxcfs-proc-loadavg (ro)
/proc/meminfo from lxcfs-proc-meminfo (ro)
/proc/stat from lxcfs-proc-stat (ro)
/proc/swaps from lxcfs-proc-swaps (ro)
/proc/uptime from lxcfs-proc-uptime (ro)
/sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8dnkk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-8dnkk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-8dnkk
Optional: false
kfserving-pvc-source:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: stable-diffusion-model-cache
ReadOnly: false
kfserving-provision-location:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
lxcfs-proc-cpuinfo:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/cpuinfo
HostPathType: File
lxcfs-proc-diskstats:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/diskstats
HostPathType: File
lxcfs-proc-meminfo:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/meminfo
HostPathType: File
lxcfs-proc-stat:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/stat
HostPathType: File
lxcfs-proc-swaps:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/swaps
HostPathType: File
lxcfs-proc-uptime:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/uptime
HostPathType: File
lxcfs-proc-loadavg:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/proc/loadavg
HostPathType: File
lxcfs-sys-devices-system-cpu-online:
Type: HostPath (bare host directory volume)
Path: /var/lib/lxcfs/sys/devices/system/cpu/online
HostPathType: File
QoS Class: Guaranteed
Node-Selectors: node.coreweave.cloud/class=gpu
Tolerations: is_gpu op=Exists
is_gpu_compute op=Exists
node.coreweave.cloud/reserved=b0a462e147a89e62c4282e915a9f13722b77e093:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m8s prioritize-image-locality Successfully assigned tenant-84e585-dev/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n to g08faff
Normal SuccessfulAttachVolume 4m8s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6ff0ab13-fa5b-4241-a74f-830bb5b178c9"
Normal Pulled 4m2s kubelet Container image "coreweave/kfserving:storage-initializer-0.6.0" already present on machine
Normal Created 4m2s kubelet Created container storage-initializer
Normal Started 4m2s kubelet Started container storage-initializer
Normal Created 4m1s kubelet Created container kfserving-container
Normal Pulled 4m1s kubelet Container image "index.docker.io/tweldoncw/stable-diffusion@sha256:c5ac315496fb838966b00bfbcb6f6022caf476aabe2da9fe5c6b82c8243d5fce" already present on machine
Normal Started 4m kubelet Started container kfserving-container
Normal Pulled 4m kubelet Container image "gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:0d85157bec609451027fe74d76c56c09276d8cc52272599953417ff639fcc5a0" already present on machine
Normal Created 4m kubelet Created container queue-proxy
Normal Started 4m kubelet Started container queue-proxy
Warning Unhealthy 3m50s kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 3m43s (x15 over 3m58s) kubelet Readiness probe failed: Get "http://10.146.135.48:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
calling the giben endpoint with curl returns again an empty file and 404 error
curl https://stable-diffusion.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-v1-4:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png
the kubectl events
above contained the following warnings
85s Warning Unhealthy Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Readiness probe failed: HTTP probe failed with statuscode: 503
78s (x15 over 93s) Warning Unhealthy Pod/stable-diffusion-predictor-default-00001-deployment-7c8c5f4h97n Readiness probe failed: Get "http://10.146.135.48:8012/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
69s (x3 over 104s) Warning InternalError InferenceService/stable-diffusion fails to reconcile predictor: fails to update knative service: Operation cannot be fulfilled on services.serving.knative.dev "stable-diffusion-predictor-default": the object has been modified; please apply your changes to the latest version and try again
69s (x2 over 102s) Warning InternalError Revision/stable-diffusion-predictor-default-00001 failed to update deployment "stable-diffusion-predictor-default-00001-deployment": Operation cannot be fulfilled on deployments.apps "stable-diffusion-predictor-default-00001-deployment": the object has been modified; please apply your changes to the latest version and try again
I dont know what they mean, but maybe they are the reason
I also made a hello world example with this yaml file:
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
name: hello-world
spec:
predictor:
containerConcurrency: 1
minReplicas: 1
maxReplicas: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu.nvidia.com/class
operator: In
values:
- A40
- key: topology.kubernetes.io/region
operator: In
values:
- ORD1
containers:
- name: kserve-container
image: pkurzend/hello-world:1
env:
- name: STORAGE_URI # Kserve mounts the PVC at /mnt/models/
value: pvc://stable-diffusion-model-cache/
# The following env vars are the default model parameters, which can be changed as needed.
resources:
requests:
cpu: 6
memory: 32Gi
nvidia.com/gpu: 1
limits:
cpu: 6
memory: 32Gi
nvidia.com/gpu: 1
and this service.py
import kserve
import logging
import os
from typing import Dict
from argparse import ArgumentParser
from io import BytesIO
MODEL_NAME = 'hello-world'
logging.basicConfig(level=kserve.constants.KSERVE_LOGLEVEL)
logger = logging.getLogger(MODEL_NAME)
logger.info(f"Model Name: {MODEL_NAME}")
class Model(kserve.Model):
def __init__(self, name: str):
super().__init__(name)
self.name = name
self.ready = False
def load(self):
logger.info(f"Loading {MODEL_NAME}")
self.ready = True
def predict(self, request: Dict) -> Dict:
return {"hello" : "world"}
if __name__ == "__main__":
model = Model(name=MODEL_NAME)
model.load()
kserve.ModelServer().start([model])
Same problem with the hello world example.
i tried all the urls returned by kubectl describe isvc <name>
kubectl describe isvc hello-world
http://hello-world.tenant-84e585-dev.svc.tenant.chi.local/v1/models/hello-world:predict 404
http://hello-world-predictor-default.tenant-84e585-dev.svc.tenant.chi.local 404
https://hello-world-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com {"status": "alive"}
https://hello-world.tenant-84e585-dev.knative.chi.coreweave.com 404
concat urls https://hello-world-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/hello-world:predict {"error": null}
kubectl describe isvc stable-diffusion5
http://stable-diffusion5.tenant-84e585-dev.svc.tenant.chi.local/v1/models/stable-diffusion5:predict 404
http://stable-diffusion5-predictor-default.tenant-84e585-dev.svc.tenant.chi.local 404
https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com {"status": "alive"}
https://stable-diffusion5.tenant-84e585-dev.knative.chi.coreweave.com 404
concat urls https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion5:predict {"error": null} https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict {"error": null}
okay this command worked now
curl https://stable-diffusion5-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/stable-diffusion-inference:predict -d '{"prompt": "California sunset on the beach, red clouds, Nikon DSLR, professional photography"}' --output sunset.png
maybe you can update the guide with the url,
https://<inferenceservice-name>-predictor-default.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/<model-name>:predict
Hi, i cant get the scale down to zeroto work like in the tutorial. In my yaml file i specified minReplicas : 0
predictor:
containerConcurrency: 1
minReplicas: 0
maxReplicas: 1
But after 35 minutes the pod is still running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
stable-diffusion6-predictor-default-00001-deployment-868c6xj88d 2/2 Running 0 35m
How can i change the time after which pods are scaled down?
What do you currently have the autoscaling.knative.dev/scale-to-zero-pod-retention-period
field set to in your manifest? https://docs.coreweave.com/compass/online-inference#scale-to-zero
The top of my yaml file looks like this now:
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
name: stable-diffusion6
annotations:
autoscaling.knative.dev/scale-to-zero-pod-retention-period: "1m5s"
But down scaling doesnt happen after 1minute 5sec
kubectl get pods
NAME READY STATUS RESTARTS AGE
stable-diffusion6-predictor-default-00001-deployment-6dc97q99tv 2/2 Running 0 7m6s
After ca 8 min (ca 6-7 min idle time), the pod is terminating
kubectl get pods
NAME READY STATUS RESTARTS AGE
stable-diffusion6-predictor-default-00001-deployment-6dc97q99tv 0/2 Terminating 0 8m6s
virt-launcher-stabel-diffusion-mqbbg 1/1 Running 0 12m
At what time did you do the last request? This doesn't look too bad to me.
the pod started when the request came in, so the last request started at age=0s. 6-7 min retention period is fine for me.
One more question though:
i am trying to deploy a knative service with mounted pvc.
this is my yaml:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: train-service-test
spec:
template:
spec:
containers:
- name: training-service-test
image: pkurzend/training-service-test:1
imagePullPolicy: IfNotPresent
env:
- name: EXAMPLE
value: "Python Sample v1"
volumeMounts:
- name: model-cache
mountPath: /mnt/models
volumes:
- name: model-cache
persistentVolumeClaim:
claimName: stable-diffusion-model-cache
When applying this error i keep getting this error
Error from server (BadRequest): error when creating "train-service.yaml": admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: expected exactly one, got neither: spec.template.spec.volumes[0].configMap, spec.template.spec.volumes[0].emptyDir, spec.template.spec.volumes[0].projected, spec.template.spec.volumes[0].secret
So i think i have to enable some feature flags like documented here:
https://knative.dev/docs/serving/configuration/feature-flags/#kubernetes-persistentvolumeclaim-pvc
As i understand i am supposed to define some configMap with those keys and the value "enabled" So i applied this yaml file:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-features
data:
kubernetes.podspec-persistent-volume-claim: "enabled"
kubernetes.podspec-persistent-volume-write: "enabled"
In the documentation it says, it should be done using the knative-service namespace, but i dont have access to it so i leave it blank.
But i keep getting the same error. Do you have any directions on how i can achieve to mount the pvc?
Thanks
You will have to use an InferenceService
to get PVC support currently. That's how our examples do it. With that said, you will get better performance by loading your models from object storage using our Tensorizer library into GPU memory. We are in the process of updating our SD examples to use the library.
I already have an inferenceService service running. But besides, i need an extra service to store some config in the pvc and then trigger a training job. The inferenceService only gives read access to the pvc, hence i am trying to mount the pvc.
Writing to a PVC from an Inference Service is usually a big no-no, since they should be immutable and scale up/down as needed and if you write to the wrong place you'll be dealing with race conditions. Writing to object storage (if it's an object size set of data) or to something like REDIS (if it's more of a message queue type data) is a better pattern.
okay thank you, i will look into this
Hi, i got the redis to work, thank, I get permission errors when creating a job with the kubernetes python client from within the pod.
this is my code to create a job
from kubernetes import client, config, utils
@app.route('/start-training-job', methods = ['POST'])
def start_train_job():
...
config.load_incluster_config()
k8s_client = client.ApiClient()
job_configuration = {
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "training-job"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "model-trainer",
"image": "pkurzend/training-job:2",
"imagePullPolicy": "IfNotPresent",
"command": [
"python3",
"./train.py",
"--user_id=philip",
"--model_name=test-model"
],
"env": [
{
"name": "HF_HOME",
"value": "/mnt/models/"
}
],
"volumeMounts": [
{
"name": "model-cache",
"mountPath": "/mnt/models"
}
],
"resources": {
"requests": {
"cpu": 6,
"memory": "32Gi",
"nvidia.com/gpu": 1
},
"limits": {
"cpu": 6,
"memory": "32Gi",
"nvidia.com/gpu": 1
}
}
}
],
"volumes": [
{
"name": "model-cache",
"persistentVolumeClaim": {
"claimName": "stable-diffusion-model-cache"
}
}
],
"affinity": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "gpu.nvidia.com/class",
"operator": "In",
"values": [
"RTX_A5000"
]
},
{
"key": "topology.kubernetes.io/region",
"operator": "In",
"values": [
"ORD1"
]
}
]
}
]
}
}
},
"restartPolicy": "Never"
}
},
"backoffLimit": 2
}
}
utils.create_from_dict(k8s_client, job_configuration)
...
I get a permission error
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.handle_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/app.py", line 124, in start_train_job
utils.create_from_dict(k8s_client, job_configuration)
File "/usr/local/lib/python3.11/site-packages/kubernetes/utils/create_from_yaml.py", line 224, in create_from_dict
raise FailToCreateError(api_exceptions)
kubernetes.utils.create_from_yaml.FailToCreateError: Error from server (Forbidden): {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch is forbidden: User \"system:serviceaccount:tenant-84e585-dev:default\" cannot create resource \"jobs\" in API group \"batch\" in the namespace \"default\"","reason":"Forbidden","details":{"group":"batch","kind":"jobs"},"code":403}
According to the kubernetes documentation
https://kubernetes.io/docs/tasks/run-application/access-api-from-pod/
the config.load_incluster_config()
method should be used to access the kubernetes api from within the pod.
Can you guide me to the right direction to why i am getting forbidden?
Is this specific to the coreweave cluster?
It looks like your code is attempting to create a job in the "default" namespace. You'll need to specify your own namespace as `tenant-84e585-dev', and then this should work.
As @ChachiTheGhost says, you need to specify the right namespace. It is also likely that you will need to create a Role and RoleBinding to give the default
service account more permissions, such as Job create.
i am trying to create a Role and then a RoleBinding, but i get permission errors
here is my yaml file for the role:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: tenant-84e585-dev
name: myrole
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["jobs", "pods"]
verbs: ["get","list","create","delete"]
This is the permission error i keep getting
Error from server (Forbidden): error when creating "role.yaml": roles.rbac.authorization.k8s.io "myrole" is forbidden: user "token-mMh6r3pcvU6MXdeq3WcC" (groups=["r:cloud-app:base" "w:ns-tenant-84e585-dev:storage" "w:ns-tenant-84e585-dev:objectstorage" "r:ns-tenant-84e585-dev:pods" "r:ns-tenant-84e585-dev:objectstorage" "w:ns-tenant-84e585-dev:virtualservers" "w:ns-tenant-84e585-dev:base" "r:ns-tenant-84e585-dev:virtualservers" "r:ns-tenant-84e585-dev:base" "r:ns-tenant-84e585-dev:full" "r:ns-tenant-84e585-dev:storage" "w:ns-tenant-84e585-dev:full" "w:ns-tenant-84e585-dev:pods" "system:authenticated"]) is attempting to grant RBAC permissions not currently held:
{APIGroups:[""], Resources:["jobs"], Verbs:["get" "list" "create" "delete"]}; resolution errors: [roles.rbac.authorization.k8s.io "myrole" not found]
Jobs is in apigroup batch
.
- apiGroups:
- batch
- extensions
resources:
- jobs
- jobs/status
- cronjobs
verbs:
- get
- list
- watch
- create
- delete
hello, i am trying to get the inference service from this example https://docs.coreweave.com/compass/examples/pytorch-hugging-face-diffusers-stable-diffusion-text-to-image running.
this is my yaml
The bottom of my service.py file looks like this (i dont really understand if the name here will be part of the api enpoint ?l)
The command
kubectl logs -l serving.kubeflow.org/inferenceservice=stable-diffusion4 --container kfserving-container
shows the following logsThe command
kubectl get isvc
returns the following:kubectl get pods
yields the followingkubectl describe pod stable-diffusion4-predictor-default-00001-deployment-ddbc5npbtv
yields the followingThese are the latest events retrieved with
kubectl events
When i open the URL given from the
kubectl get isvc
command, i get a page not found 404 error. I tried the following urls:https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v1/models/ https://stable-diffusion4.tenant-84e585-dev.knative.chi.coreweave.com/v2/health/ready
following the documentation here https://kserve.github.io/website/modelserving/data_plane/ and here https://kserve.github.io/website/modelserving/inference_api/, i expected to get a response back
Can you guide me to the right direction what i am doing wrong here?
Thanks