cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.46k stars 2.99k forks source link

models do not appear in cvat models tavb #7733

Closed didumin closed 2 months ago

didumin commented 6 months ago

Actions before raising this issue

Steps to Reproduce

helm chart cvat deployed according to https://docs.cvat.ai/docs/administration/advanced/k8s_deployment_with_helm/ nuclio enabled and deployed as well after deploying model via nuclio dashboard UI a pod with model exist

Expected Behavior

model appears in cvat

Possible Solution

No response

Context

no models in cvat models tab Screenshot 2024-04-05 at 16 23 12

Environment

Logs from nuclio model pod 
 > │ 24.04.05 12:22:08.526 (I)                 processor Starting processor {"version": "Label: 1.12.4, Git commit: 0b5ab5871f44922e01f04d1643014adca1867f5e, OS: linux, Arch: amd64, Go version: go1.21.1"}                                           │
│ 24.04.05 12:22:08.526 (D)                 processor Read configuration {"config": "{\n    \"metadata\": {\n        \"name\": \"pod-name\",\n        \"namespace\": \"cvat\",\n        \"labels\": {\n            \"nuclio.io/app\": \"functionres │
│ 24.04.05 12:22:08.526 (I) cessor.healthcheck.server Listening {"listenAddress": ":8082"}                                                                                                                                                          │
│ 24.04.05 12:22:08.527 (D)            processor.http Creating worker pool {"num": 1}                                                                                                                                                               │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Creating listener socket {"path": "/tmp/nuclio-rpc-co7ups6f7173c8qf51fg.sock"}                                                                                                                │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Creating listener socket {"path": "/tmp/nuclio-rpc-co7ups6f7173c8qf51g0.sock"}                                                                                                                │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Using Python wrapper script path {"path": "/opt/nuclio/_nuclio_wrapper.py"}                                                                                                                   │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Using Python handler {"handler": "main:handler"}                                                                                                                                              │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Using Python executable {"path": "/usr/bin/python3"}                                                                                                                                          │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Setting PYTHONPATH {"value": "PYTHONPATH=/opt/nuclio"}                                                                                                                                        │
│ 24.04.05 12:22:08.527 (D) sor.http.w0.python.logger Running wrapper {"command": "/usr/bin/python3 -u /opt/nuclio/_nuclio_wrapper.py --handler main:handler --event-socket-path /tmp/nuclio-rpc-co7ups6f7173c8qf51fg.sock --control-socket-path /t │
│ 24.04.05 12:22:11.639 (I) sor.http.w0.python.logger Wrapper connected {"wid": 0, "pid": 14}                                                                                                                                                       │
│ 24.04.05 12:22:11.639 (D) sor.http.w0.python.logger Creating control connection {"wid": 0}                                                                                                                                                        │
│ 24.04.05 12:22:11.639 (D) sor.http.w0.python.logger Control connection created {"wid": 0}                                                                                                                                                         │
│ 24.04.05 12:22:11.639 (D) sor.http.w0.python.logger Waiting for start                                                                                                                                                                             │
│ 2024-04-05 12:22:12,638 - clearml.storage - INFO - Downloading: 12.50MB from s3://storage.yandexcloud.net/napoleonit-clearml/retail/models/retail-yolov8-onnx.b90731540cb94742af4fe1f85fae4f2b/artifacts/yolov8_onnx/best.onnx                    │
│                                                0% | 0/12.5 MB [00:00<?, ?MB/s]: ██████████████████████████▍        80% | 10.0/12.5 MB [00:00<00:00, 78.71MB/s]: █████████████████████████████████ 100% | 12.5/12.5 MB [00:00<00:00, 94.36MB/s]:   │
│ 2024-04-05 12:22:12,772 - clearml.storage - INFO - Downloaded 12.50 MB successfully from s3://storage.yandexcloud.net/napoleonit-clearml/retail/models/retail-yolov8-onnx.b90731540cb94742af4fe1f85fae4f2b/artifacts/yolov8_onnx/best.onnx , save │
│ 24.04.05 12:22:12.905 (D) sor.http.w0.python.logger Started                                                                                                                                                                                       │
│ 24.04.05 12:22:12.905 (I)                 processor Starting event timeout watcher {"timeout": "30s"}                                                                                                                                             │
│ 24.04.05 12:22:12.905 (D) .webadmin.server.triggers Registered custom route {"routeName": "triggers", "stream": false, "pattern": "/{id}/stats", "method": "GET"}                                                                                 │
│ 24.04.05 12:22:12.905 (D) processor.webadmin.server Registered resource {"name": "triggers"}                                                                                                                                                      │
│ 24.04.05 12:22:12.905 (W)                 processor No metric sinks configured, metrics will not be published                                                                                                                                     │
│ 24.04.05 12:22:12.905 (D) sor.http.w0.python.logger Received control message {"messageKind": "wrapperInitialized"}                                                                                                                                │
│ 24.04.05 12:22:12.905 (D) sor.http.w0.python.logger Sending data on control socket {"data_length": 2, "worker_id": "0"}                                                                                                                           │
│ 24.04.05 12:22:12.905 (D)                 processor Starting triggers {"triggersError": "json: unsupported value: encountered a cycle via *http.http"}                                                                                            │
│ 24.04.05 12:22:12.907 (I)            processor.http Starting {"listenAddress": ":8080", "readBufferSize": 16384, "maxRequestBodySize": 33554432, "reduceMemoryUsage": false, "cors": null}                                                        │
│ 24.04.05 12:22:12.907 (I) processor.webadmin.server Listening {"listenAddress": ":8081"}                                                                                                                                                          │
│ 24.04.05 12:22:12.907 (D)                 processor Processor started
bsekachev commented 6 months ago

Hello, be sure that the model was actually deployed and the container is healthy.

didumin commented 6 months ago

Hello, @bsekachev, model status in nuclio is running

Screenshot 2024-04-08 at 10 15 35

container details also shows "Status: Running"

% kubectl describe pods -n cvat nuclio-pod-name-7f46bdfbfb-4pqml Name: nuclio-pod-name-7f46bdfbfb-4pqml Namespace: cvat Priority: 0 Service Account: default Node: cl1751vimeosmurmvbkr-eqid/10.2.0.11 Start Time: Fri, 05 Apr 2024 15:22:07 +0300 Labels: nuclio.io/app=functionres nuclio.io/class=function nuclio.io/function-name=pod-name nuclio.io/function-version=latest nuclio.io/project-name=cvat pod-template-hash=7f46bdfbfb Annotations: framework: onnx name: cvat-ui-model-name nuclio.io/image-hash: 1712319722245667772 spec: [ { "id": 0, "name": "pricetag" }, { "id": 1, "name": "price" } ] Status: Running IP: 10.112.133.151 IPs: IP: 10.112.133.151 Controlled By: ReplicaSet/nuclio-pod-name-7f46bdfbfb Containers: nuclio: Container ID: containerd://b2d440f3b5281b942f959bfc379ef517e7d50bf1c86e65825252f75e31b94ab5 Image: reg.gitlab.itnap.ru/machine-learning/infrastructure_and_tools/cvat_autolabeling/cvat.onnx:latest Image ID: reg.gitlab.itnap.ru/machine-learning/infrastructure_and_tools/cvat_autolabeling/cvat.onnx@sha256:046d44fdf48f22b0acf1256bcfe1af6a4e3b6ad7b0aa584dc0f09030c81edafe Port: 8080/TCP Host Port: 0/TCP State: Running Started: Fri, 05 Apr 2024 15:22:08 +0300 Ready: True Restart Count: 0 Requests: cpu: 25m memory: 1Mi Liveness: http-get http://:8082/live delay=10s timeout=3s period=5s #success=1 #failure=3 Readiness: http-get http://:8080/__internal/health delay=5s timeout=1s period=1s #success=1 #failure=10 Environment: FUNCTION_TASK_ID: b90731540cb94742af4fe1f85fae4f2b FUNCTION_LABELS: [ { "id": 0, "name": "pricetag" }, { "id": 1, "name": "price" } ]

  NUCLIO_FUNCTION_NAME:      pod-name
  NUCLIO_FUNCTION_VERSION:   latest
  NUCLIO_FUNCTION_INSTANCE:  nuclio-pod-name-7f46bdfbfb-4pqml (v1:metadata.name)
Mounts:
  /etc/nuclio/config/platform from platform-config-volume (rw)
  /etc/nuclio/config/processor from processor-config-volume (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wjkxq (ro)

Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: platform-config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: nuclio-platform-config Optional: true processor-config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: nuclio-pod-name Optional: false kube-api-access-wjkxq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:

Maybe there are any additional troubleshooting steps or some kind of info how cvat do discover nuclio models?

didumin commented 6 months ago

@bsekachev may you suggest any further investigation steps or head me to related docs?