cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.2k stars 2.95k forks source link

503 Server Error: Service Unavailable for url: http://host.docker.internal:32770/ #8200

Closed GabrielGosden closed 1 month ago

GabrielGosden commented 1 month ago

Actions before raising this issue

Steps to Reproduce

Follow the Serverless tutorial

  1. git clone https://github.com/cvat-ai/cvat
  2. docker compose -f docker-compose.yml -f docker-compose.dev.yml -f components/serverless/docker-compose.serverless.yml up -d --build

Follow the Semi-automatic and Automatic Annotation tutorial

  1. wget https://github.com/nuclio/nuclio/releases/download/1.13.0/nuctl-1.13.0-linux-amd64 which is the same version as in docker-compose.serverless.yml
  2. sudo chmod +x nuctl-1.13.0-linux-amd64
  3. sudo ln -sf $(pwd)/nuctl-1.13.0-linux-amd64 /usr/local/bin/nuctl
  4. Deploy model ./serverless/deploy_cpu.sh serverless/pytorch/facebookresearch/sam/

Expected Behavior

After installing CVAT and deploying function i would expect to be able to use it for automatic segmentation within CVAT. When try to use the tool i get the following error: 503 Server Error: Service Unavailable for url: http://host.docker.internal:32770/

The output of nuctl get function looks good:

 NAMESPACE | NAME                           | PROJECT | STATE | REPLICAS | NODE PORT 
 nuclio    | pth-facebookresearch-sam-vit-h | cvat    | ready | 1/1      | 32770 

From http://localhost:8070/projects/cvat/functions the function is deployed with zero errors and is running: image

The docker logs nuclio-nuclio-pth-facebookresearch-sam-vit-h also has zero errors:

24.07.19 06:48:09.104 (I) cessor.healthcheck.server Listening {"listenAddress": ":8082"} 24.07.19 06:48:09.104 (D) processor.http Creating worker pool {"num": 2} 24.07.19 06:48:09.104 (D) sor.http.w1.python.logger Creating listener socket {"path": "/tmp/nuclio-rpc-cqd0oab06nb000sb2lm0.sock"} 24.07.19 06:48:09.104 (D) sor.http.w0.python.logger Creating listener socket {"path": "/tmp/nuclio-rpc-cqd0oab06nb000sb2lmg.sock"} 24.07.19 06:48:09.105 (D) sor.http.w1.python.logger Creating listener socket {"path": "/tmp/nuclio-rpc-cqd0oab06nb000sb2ln0.sock"} 24.07.19 06:48:09.105 (D) sor.http.w0.python.logger Creating listener socket {"path": "/tmp/nuclio-rpc-cqd0oab06nb000sb2lng.sock"} 24.07.19 06:48:09.105 (D) sor.http.w1.python.logger Using Python wrapper script path {"path": "/opt/nuclio/_nuclio_wrapper.py"} 24.07.19 06:48:09.105 (D) sor.http.w0.python.logger Using Python wrapper script path {"path": "/opt/nuclio/_nuclio_wrapper.py"} 24.07.19 06:48:09.105 (D) sor.http.w1.python.logger Using Python handler {"handler": "main:handler"} 24.07.19 06:48:09.105 (D) sor.http.w0.python.logger Using Python handler {"handler": "main:handler"} 24.07.19 06:48:09.105 (D) sor.http.w0.python.logger Using Python executable {"path": "/usr/bin/python3"} 24.07.19 06:48:09.105 (D) sor.http.w1.python.logger Using Python executable {"path": "/usr/bin/python3"} 24.07.19 06:48:09.105 (D) sor.http.w0.python.logger Setting PYTHONPATH {"value": "PYTHONPATH=/opt/nuclio:/opt/nuclio/sam"} 24.07.19 06:48:09.105 (D) sor.http.w1.python.logger Setting PYTHONPATH {"value": "PYTHONPATH=/opt/nuclio:/opt/nuclio/sam"} 24.07.19 06:48:09.105 (D) sor.http.w0.python.logger Running wrapper {"command": "/usr/bin/python3 -u /opt/nuclio/_nuclio_wrapper.py --handler main:handler --event-socket-path /tmp/nuclio-rpc-cqd0oab06nb000sb2lmg.sock --control-socket-path /tmp/nuclio-rpc-cqd0oab06nb000sb2lng.sock --platform-kind local --namespace nuclio --worker-id 0 --trigger-kind http --trigger-name myHttpTrigger --decode-event-strings"} 24.07.19 06:48:09.105 (D) sor.http.w1.python.logger Running wrapper {"command": "/usr/bin/python3 -u /opt/nuclio/_nuclio_wrapper.py --handler main:handler --event-socket-path /tmp/nuclio-rpc-cqd0oab06nb000sb2lm0.sock --control-socket-path /tmp/nuclio-rpc-cqd0oab06nb000sb2ln0.sock --platform-kind local --namespace nuclio --worker-id 1 --trigger-kind http --trigger-name myHttpTrigger --decode-event-strings"} 24.07.19 06:48:11.572 (I) sor.http.w0.python.logger Wrapper connected {"wid": 0, "pid": 24} 24.07.19 06:48:11.572 (D) sor.http.w0.python.logger Creating control connection {"wid": 0} 24.07.19 06:48:11.572 (D) sor.http.w0.python.logger Control connection created {"wid": 0} 24.07.19 06:48:11.572 (D) sor.http.w0.python.logger Waiting for start 24.07.19 06:48:11.572 (I) sor.http.w0.python.logger Init context... 0% {"worker_id": "0"} 24.07.19 06:48:11.689 (I) sor.http.w1.python.logger Wrapper connected {"wid": 1, "pid": 23} 24.07.19 06:48:11.689 (D) sor.http.w1.python.logger Creating control connection {"wid": 1} 24.07.19 06:48:11.689 (D) sor.http.w1.python.logger Control connection created {"wid": 1} 24.07.19 06:48:11.689 (D) sor.http.w1.python.logger Waiting for start 24.07.19 06:48:11.689 (I) sor.http.w1.python.logger Init context... 0% {"worker_id": "1"} 24.07.19 06:48:20.204 (I) sor.http.w1.python.logger Init context...100% {"worker_id": "1"} 24.07.19 06:48:20.205 (D) sor.http.w1.python.logger Started 24.07.19 06:48:20.205 (D) sor.http.w1.python.logger Sending data on control socket {"data_length": 2, "worker_id": "1"} 24.07.19 06:48:20.205 (D) sor.http.w1.python.logger Received control message {"messageKind": "wrapperInitialized"} 24.07.19 06:48:20.213 (I) sor.http.w0.python.logger Init context...100% {"worker_id": "0"} 24.07.19 06:48:20.213 (D) sor.http.w0.python.logger Started 24.07.19 06:48:20.213 (I) processor Starting event timeout watcher {"timeout": "30s"} 24.07.19 06:48:20.213 (D) sor.http.w0.python.logger Sending data on control socket {"data_length": 2, "worker_id": "0"} 24.07.19 06:48:20.213 (D) .webadmin.server.triggers Registered custom route {"routeName": "triggers", "stream": false, "pattern": "/{id}/stats", "method": "GET"} 24.07.19 06:48:20.213 (D) sor.http.w0.python.logger Received control message {"messageKind": "wrapperInitialized"} 24.07.19 06:48:20.213 (D) processor.webadmin.server Registered resource {"name": "triggers"} 24.07.19 06:48:20.213 (W) processor No metric sinks configured, metrics will not be published 24.07.19 06:48:20.213 (D) processor Starting triggers {"triggersError": "json: unsupported value: encountered a cycle via *http.http"} 24.07.19 06:48:20.215 (I) processor.http Starting {"listenAddress": ":8080", "readBufferSize": 16384, "maxRequestBodySize": 33554432, "reduceMemoryUsage": false, "cors": null} 24.07.19 06:48:20.215 (I) processor.webadmin.server Listening {"listenAddress": ":8081"} 24.07.19 06:48:20.215 (D) processor Processor started

Output of docker compose -f docker-compose.yml -f docker-compose.dev.yml -f components/serverless/docker-compose.serverless.yml ps

NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS cvat_clickhouse clickhouse/clickhouse-server:23.11-alpine "/entrypoint.sh" cvat_clickhouse 5 hours ago Up 5 hours 9000/tcp, 0.0.0.0:8123->8123/tcp, :::8123->8123/tcp, 9009/tcp cvat_db postgres:15-alpine "docker-entrypoint.s…" cvat_db 5 hours ago Up 5 hours 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp cvat_opa openpolicyagent/opa:0.63.0 "/opa run --server -…" cvat_opa 5 hours ago Up 5 hours 0.0.0.0:8181->8181/tcp, :::8181->8181/tcp cvat_redis_inmem redis:7.2.3-alpine "docker-entrypoint.s…" cvat_redis_inmem 5 hours ago Up 5 hours 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp cvat_redis_ondisk apache/kvrocks:2.7.0 "kvrocks -c /var/lib…" cvat_redis_ondisk 5 hours ago Up 5 hours (healthy) 0.0.0.0:6666->6666/tcp, :::6666->6666/tcp cvat_server cvat/server:dev "./backend_entrypoin…" cvat_server 5 hours ago Up 5 hours 8080/tcp, 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp cvat_ui cvat/ui:dev "/docker-entrypoint.…" cvat_ui 5 hours ago Up 5 hours 80/tcp cvat_utils cvat/server:dev "./backend_entrypoin…" cvat_utils 5 hours ago Up 5 hours 8080/tcp cvat_vector timberio/vector:0.26.0-alpine "/usr/local/bin/vect…" cvat_vector 5 hours ago Up 5 hours 0.0.0.0:8282->80/tcp, :::8282->80/tcp cvat_worker_analytics_reports cvat/server:dev "./backend_entrypoin…" cvat_worker_analytics_reports 5 hours ago Up 5 hours 8080/tcp, 0.0.0.0:9095->9095/tcp, :::9095->9095/tcp cvat_worker_annotation cvat/server:dev "./backend_entrypoin…" cvat_worker_annotation 5 hours ago Up 5 hours 8080/tcp, 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp cvat_worker_export cvat/server:dev "./backend_entrypoin…" cvat_worker_export 5 hours ago Up 5 hours 8080/tcp, 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp cvat_worker_import cvat/server:dev "./backend_entrypoin…" cvat_worker_import 5 hours ago Up 5 hours 8080/tcp, 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp cvat_worker_quality_reports cvat/server:dev "./backend_entrypoin…" cvat_worker_quality_reports 5 hours ago Up 5 hours 8080/tcp, 0.0.0.0:9094->9094/tcp, :::9094->9094/tcp cvat_worker_webhooks cvat/server:dev "./backend_entrypoin…" cvat_worker_webhooks 5 hours ago Up 5 hours 8080/tcp nuclio quay.io/nuclio/dashboard:1.13.0-amd64 "/docker-entrypoint.…" nuclio 5 hours ago Up 5 hours (healthy) 80/tcp, 0.0.0.0:8070->8070/tcp, :::8070->8070/tcp traefik traefik:v2.10 "/entrypoint.sh trae…" traefik 5 hours ago Up 5 hours 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 80/tcp, 0.0.0.0:8090->8090/tcp, :::8090->8090/tcp

Possible Solution

I have found the following possible related issues. I have tried to implement the suggestions from there without success:

https://github.com/cvat-ai/cvat/issues/6582 https://github.com/cvat-ai/cvat/issues/4904 https://github.com/cvat-ai/cvat/issues/5205

Context

No response

Environment

Git hash commit: f1726c45b43903c337bde952990c3230fe98bede
Docker version: 25.0.4
Are you using Docker Swarm or Kubernetes: No
Operating system: Ubuntu 20.04 LTS
GabrielGosden commented 1 month ago

To further debug the issue I've now followed the same installation procedure on another machine. I have the same issues on this machine.

We are also running behind a corporate proxy. So to be able to install CVAT and deploy the functions i have modified the Dockerfile and function.yaml to use our company proxy when using apt-get, pipand curl

The function.yaml file now looks like this:

# Copyright (C) 2023-2024 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

metadata:
  name: pth-facebookresearch-sam-vit-h
  namespace: cvat
  annotations:
    name: Segment Anything
    version: 2
    type: interactor
    spec:
    min_pos_points: 1
    min_neg_points: 0
    animated_gif: https://raw.githubusercontent.com/cvat-ai/cvat/develop/site/content/en/images/hrnet_example.gif
    help_message: The interactor allows to get a mask of an object using at least one positive, and any negative points inside it

spec:
  description: Interactive object segmentation with Segment-Anything
  runtime: 'python:3.8'
  handler: main:handler
  eventTimeout: 30s
  env:
    - name: PYTHONPATH
      value: /opt/nuclio/sam

  build:
    image: cvat.pth.facebookresearch.sam.vit_h
    baseImage: ubuntu:22.04

    directives:
      preCopy:
      # disable interactive frontend
        - kind: ENV
          value: DEBIAN_FRONTEND=noninteractive
      # set workdir
        - kind: WORKDIR
          value: /opt/nuclio/sam
      # Setup proxy
        - kind: ENV
          value: http_proxy="http://<username>:<password>@<ip>:<port>"
        - kind: ENV
          value: https_proxy="http://<username>:<password>@<ip>:<port>"
        - kind: ENV
          value: HTTP_PROXY="http://<username>:<password>@<ip>:<port>"
        - kind: ENV
          value: HTTPS_PROXY="http://<username>:<password>@<ip>:<port>"
      # install basic deps
        - kind: RUN
          value: apt-get update && apt-get -y install curl git python3 python3-pip ffmpeg libsm6 libxext6
      # install sam deps
        - kind: RUN
          value: pip3 install --trusted-host pypi.org --trusted-host files.pythonhosted.org torch torchvision torchaudio pycocotools matplotlib onnxruntime onnx
      # install sam code
        - kind: RUN
          value: pip3 install --trusted-host pypi.org --trusted-host files.pythonhosted.org git+https://github.com/facebookresearch/segment-anything.git
      # download sam weights
        - kind: RUN
          value: curl --insecure -O https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
      # map pip3 and python3 to pip and python
        - kind: RUN
          value: ln -s /usr/bin/pip3 /usr/local/bin/pip && ln -s /usr/bin/python3 /usr/bin/python
  triggers:
    myHttpTrigger:
      maxWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3
      mountMode: volume
GabrielGosden commented 1 month ago

When trying to use SAM from UI the following error from network analyzer is shown: image But when using http://localhost:8080/api/lambda/functions/pth-facebookresearch-sam-vit-h?org= image It works without any issues

bsekachev commented 1 month ago

It works without any issues

These requests are different. The first one is POST, the second one is GET.

POST is trying to reach container/host with nuclio function and raises 503 because it is unavailable. Perhaps there are some issues with infrastructure, but without knowing details I may only suggest that it happens because of set HTTP_PROXY in CVAT container. Thus, instead of host with nuclio function, it tries to find the function on your proxy server and fails.

You may try to setup NO_PROXY properly, adding localhost and host.docker.internal to the variable.

GabrielGosden commented 1 month ago

Thanks @bsekachev for the good idea. After a bit of experimenting i was able to fix the issue by adding the NO_PROXY you suggested to the serverless/openvino/base/Dockerfile. This is the base image for openvino on which nuclio deploys their functions.