cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.12k stars 2.94k forks source link

Unable to deploy custom model #5106

Closed bagherig closed 1 year ago

bagherig commented 1 year ago

I am trying to deploy a custom model on cvat. The docker container runs successfully.

22.10.12 21:25:50.312                     nuctl (I) Project created {"Name": "cvat", "Namespace": "nuclio"}
Deploying models function...
22.10.12 21:25:50.457                     nuctl (I) Deploying function {"name": ""}
22.10.12 21:25:50.457                     nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.8.14, Git commit: cbb0774230996a3eb4621c1a2079e2317578005b, OS: linux, Arch: amd64, Go version: go1.17.8", "name": ""}
22.10.12 21:25:50.555                     nuctl (I) Staging files and preparing base images
22.10.12 21:25:50.577                     nuctl (I) Building processor image {"registryURL": "", "imageName": "cvat/pt.wheat.drone:latest"}
22.10.12 21:25:50.577     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.8.14-amd64"}
22.10.12 21:25:52.245     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
22.10.12 21:25:54.055            nuctl.platform (I) Building docker image {"image": "cvat/pt.wheat.drone:latest"}
22.10.12 21:25:54.356            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pt.wheat.drone:latest", "registry": ""}
22.10.12 21:25:54.356            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pt.wheat.drone:latest"}
22.10.12 21:25:54.356                     nuctl (I) Build complete {"result": {"Image":"cvat/pt.wheat.drone:latest","UpdatedFunctionConfig":{"metadata":{"name":"pt-wheat-drone","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","name":"Wheat Detector for Drone","spec":"[\n  { \"id\": 0, \"name\": \"wheat\" },\n]\n","type":"detector"}},"spec":{"description":"Detecting Wheat Heads in Drone Images via PyTorch","handler":"main:handler","runtime":"python:3.8","resources":{"requests":{"cpu":"25m","memory":"1Mi"}},"image":"cvat/pt.wheat.drone:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/root/webapp_scvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"image":"cvat/pt.wheat.drone","baseImage":"ubuntu:20.04","directives":{"postCopy":[{"kind":"RUN","value":"pip install Pillow numpy"}],"preCopy":[{"kind":"ENV","value":"DEBIAN_FRONTEND=noninteractive"},{"kind":"RUN","value":"apt-get update && apt-get -y install python3 python3-pip"},{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"pip3 install torch==1.12.1+cpu -f https://download.pytorch.org/whl/torch/"},{"kind":"RUN","value":"pip3 install torchvision==0.12.0+cpu -f https://download.pytorch.org/whl/torchvision/"},{"kind":"RUN","value":"ln -s /usr/bin/pip3 /usr/local/bin/pip"},{"kind":"RUN","value":"ln -s /usr/bin/python3 /usr/local/bin/python"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":120,"securityContext":{},"eventTimeout":"30s"}}}}
22.10.12 21:25:54.365                     nuctl (I) Cleaning up before deployment {"functionName": "pt-wheat-drone"}
22.10.12 21:25:54.390                     nuctl (I) Function already exists, deleting function containers {"functionName": "pt-wheat-drone"}
22.10.12 21:25:55.170            nuctl.platform (I) Waiting for function to be ready {"timeout": 120}
22.10.12 21:25:57.021                     nuctl (I) Function deploy complete {"functionName": "pt-wheat-drone", "httpPort": 49324, "internalInvocationURLs": ["172.17.0.4:8080"], "externalInvocationURLs": []}
  NAMESPACE |      NAME      | PROJECT | STATE | REPLICAS | NODE PORT
  nuclio    | pt-wheat-drone | cvat    | ready | 1/1      |     49324

but I am unable to load the model on the web. Error: Request failed with status code 500. "\n<!doctype html>\n<html lang=\"en\">\n<head>\n <title>Server Error (500)</title>\n</head>\n<body>\n <h1>Server Error (500)</h1><p></p>\n</body>\n</html>\n".

Here are my docker logs:

22.10.12 21:25:55.185            processor.http (D) Creating worker pool {"num": 2}
22.10.12 21:25:55.186 sor.http.w0.python.logger (D) Creating listener socket {"path": "/tmp/nuclio-rpc-cd3j1oo5hfirhisf6p80.sock"}
22.10.12 21:25:55.186 sor.http.w1.python.logger (D) Creating listener socket {"path": "/tmp/nuclio-rpc-cd3j1oo5hfirhisf6p7g.sock"}
22.10.12 21:25:55.186 sor.http.w0.python.logger (D) Using Python wrapper script path {"path": "/opt/nuclio/_nuclio_wrapper.py"}
22.10.12 21:25:55.186 sor.http.w0.python.logger (D) Using Python handler {"handler": "main:handler"}
22.10.12 21:25:55.186 sor.http.w1.python.logger (D) Using Python wrapper script path {"path": "/opt/nuclio/_nuclio_wrapper.py"}
22.10.12 21:25:55.186 sor.http.w0.python.logger (D) Using Python executable {"path": "/usr/bin/python3"}
22.10.12 21:25:55.186 sor.http.w1.python.logger (D) Using Python handler {"handler": "main:handler"}
22.10.12 21:25:55.186 sor.http.w0.python.logger (D) Setting PYTHONPATH {"value": "PYTHONPATH=/opt/nuclio"}
22.10.12 21:25:55.186 sor.http.w1.python.logger (D) Using Python executable {"path": "/usr/bin/python3"}
22.10.12 21:25:55.186 sor.http.w0.python.logger (D) Running wrapper {"command": "/usr/bin/python3 -u /opt/nuclio/_nuclio_wrapper.py --handler main:handler --socket-path /tmp/nuclio-rpc-cd3j1oo5hfirhisf6p80.sock --platform-kind local --namespace nuclio --worker-id 0 --trigger-kind http --trigger-name myHttpTrigger --decode-event-strings"}
22.10.12 21:25:55.186 sor.http.w1.python.logger (D) Setting PYTHONPATH {"value": "PYTHONPATH=/opt/nuclio"}
22.10.12 21:25:55.186 sor.http.w1.python.logger (D) Running wrapper {"command": "/usr/bin/python3 -u /opt/nuclio/_nuclio_wrapper.py --handler main:handler --socket-path /tmp/nuclio-rpc-cd3j1oo5hfirhisf6p7g.sock --platform-kind local --namespace nuclio --worker-id 1 --trigger-kind http --trigger-name myHttpTrigger --decode-event-strings"}
22.10.12 21:25:55.746 sor.http.w1.python.logger (I) Wrapper connected {"wid": 1, "pid": 20}
22.10.12 21:25:55.746 sor.http.w1.python.logger (D) Waiting for start
22.10.12 21:25:55.746 sor.http.w0.python.logger (I) Wrapper connected {"wid": 0, "pid": 19}
22.10.12 21:25:55.746 sor.http.w0.python.logger (D) Waiting for start
{"datetime": "2022-10-12 21:25:55,746", "level": "info", "message": "Replacing logger output", "with": {"handler_name": "default", "worker_id": "0"}}
{"datetime": "2022-10-12 21:25:55,746", "level": "info", "message": "Replacing logger output", "with": {"handler_name": "default", "worker_id": "1"}}
22.10.12 21:25:55.747 sor.http.w1.python.logger (I) Init context...  0% {"worker_id": "1"}
22.10.12 21:25:55.747 sor.http.w0.python.logger (I) Init context...  0% {"worker_id": "0"}
22.10.12 21:25:55.902 sor.http.w1.python.logger (I) Init context...100% {"worker_id": "1"}
22.10.12 21:25:55.903 sor.http.w1.python.logger (D) Started
22.10.12 21:25:55.903 sor.http.w0.python.logger (I) Init context...100% {"worker_id": "0"}
22.10.12 21:25:55.903 sor.http.w0.python.logger (D) Started
22.10.12 21:25:55.903                 processor (I) Starting event timeout watcher {"timeout": "30s"}
22.10.12 21:25:55.903 .webadmin.server.triggers (D) Registered custom route {"routeName": "triggers", "stream": false, "pattern": "/{id}/stats", "method": "GET"}
22.10.12 21:25:55.903 processor.webadmin.server (D) Registered resource {"name": "triggers"}
22.10.12 21:25:55.903                 processor (W) No metric sinks configured, metrics will not be published
22.10.12 21:25:55.903                 processor (D) Starting triggers {"triggersError": "json: unsupported value: encountered a cycle via *http.http"}
22.10.12 21:25:55.904            processor.http (I) Starting {"listenAddress": ":8080", "readBufferSize": 16384, "maxRequestBodySize": 33554432, "reduceMemoryUsage": false, "cors": null}
22.10.12 21:25:55.904 processor.webadmin.server (I) Listening {"listenAddress": ":8081"}
22.10.12 21:25:55.904                 processor (D) Processor started

Here is my .yaml file:

  name: pt-wheat-drone
  namespace: cvat
  annotations:
    name: Wheat Detector for Drone
    type: detector
    framework: pytorch
    spec: |
      [
        { "id": 0, "name": "wheat" },
      ]

spec:
  description: Detecting Wheat Heads in Drone Images via PyTorch
  runtime: 'python:3.8' 
  handler: main:handler
  eventTimeout: 30s # the global event timeout

  build:
    image: cvat/pt.wheat.drone # the name of your docker image
    baseImage: ubuntu:20.04 # the name of a base container image from which to build the function

    directives: # commands to build your docker image
      preCopy:
        - kind: ENV
          value: DEBIAN_FRONTEND=noninteractive
        - kind: RUN
          value: apt-get update && apt-get -y install python3 python3-pip # Linux command
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: pip3 install torch==1.12.1+cpu -f https://download.pytorch.org/whl/torch/
        - kind: RUN
          value: pip3 install torchvision==0.12.0+cpu -f https://download.pytorch.org/whl/torchvision/
        - kind: RUN
          value: ln -s /usr/bin/pip3 /usr/local/bin/pip
        - kind: RUN
          value: ln -s /usr/bin/python3 /usr/local/bin/python

      postCopy:
        - kind: RUN
          value: pip install Pillow numpy

  triggers:
    myHttpTrigger:
      maxWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3
      mountMode: volume

and main.py file:

"""
Created on Tue Oct 11 22:12:38 2022

@author: mmzhang
"""

import torch, torchvision 
import json 
import io, base64
from PIL import Image
import numpy as np

MODEL_PATH = "wheat-drone.pt" # NTC

def init_context(context):
    context.logger.info("Init context...  0%")  
    model = torch.jit.load(MODEL_PATH)
    context.user_data.model_handler = model
    context.logger.info("Init context...100%")

def handler(context, event):
    context.logger.info("Run wheat head detector")

    # load image: numpy.ndarray, dtype=int
    # shape=(height, width, 3) where 3=(B,G,R) 
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"]))
    image = Image.open(buf)
    image = image.convert("RGB") # "RGB"
    image = np.asarray(image) # to numpy
    image = image[:, :, ::-1] # "BGR"   
    image = torch.from_numpy(image)

    # model
    # predictions: torch.Tensor, dtype=float32
    # shape=(number_of_detected_wheat_heads, 6) where 
    # 6=(xleft, ytop, xright, ybottom, confidence, class)
    predictions = context.user_data.model_handler(image)

    # results
    results = []
    for pred in predictions:
        box = pred[:4].tolist()
        score = str(float(pred[4]))
        results.append({
            "confidence": score,
            "label": 'wheat',
            "points": box,
            "type": "rectangle",
        })

    return context.Response(
        body=json.dumps(results),
        headers={},
        content_type='application/json',
        status_code=200
    )
Errin890 commented 1 year ago

issue self-fixed, was another comma problem in the spec field (removed it)

nfrvnikita commented 1 year ago

проблема устранена самостоятельно, была еще одна проблема с запятой в поле спецификации (удалена)

@Errin890 Hi, can you tell me where you put your custom weights and how? I want to cast my yolov5 model, but I can't do it.

main.py

metadata:
  name: ultralytics-yolov5
  namespace: cvat
  annotations:
    name: YOLO v5
    type: detector
    framework: pytorch
    spec: |
      [
        { "id": 0, "name": "person" },
        { "id": 1, "name": "Kitty" }
      ]
spec:
  description: YOLO v5 via pytorch hub
  runtime: 'python:3.6'
  handler: main:handler
  eventTimeout: 30s
  build:
    image: cvat.ultralytics-yolov5
    baseImage: ultralytics/yolov5:latest-cpu

    directives:
      preCopy:
        - kind: USER
          value: root
        - kind: RUN
          value: apt update && apt install --no-install-recommends -y libglib2.0-0
        - kind: WORKDIR
          value: /opt/nuclio

  triggers:
    myHttpTrigger:
      maxWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3
      mountMode: volume

function.yaml

 import json
import base64
from PIL import Image
import io
import torch

def init_context(context):
    context.logger.info("Init context...  0%")

    # Read the DL model
    model = torch.hub.load('ultralyics/yolov5', '/opt/nuclio/weights.pt')
    context.user_data.model = model

    context.logger.info("Init context...100%")

def handler(context, event):
    context.logger.info("Run yolo-v5 model")
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"]))
    threshold = float(data.get("threshold", 0.5))
    context.user_data.model.conf = threshold
    image = Image.open(buf)
    yolo_results_json = context.user_data.model(image).pandas().xyxy[0].to_dict(orient='records')

    encoded_results = []
    for result in yolo_results_json:
        encoded_results.append({
            'confidence': result['confidence'],
            'label': result['name'],
            'points': [
                result['xmin'],
                result['ymin'],
                result['xmax'],
                result['ymax']
            ],
            'type': 'rectangle'
        })

    return context.Response(body=json.dumps(encoded_results), headers={},
        content_type='application/json', status_code=200)
Crescent-Saturn commented 1 year ago

@Errin890 Hi, I would like also to know how and where do you put your model weights .pt file? In your function.yaml, it doesn't include this. Thanks!

@nfrvnikita Have your figured out where and how? Thanks!

iaverypadberg commented 5 months ago

issue self-fixed, was another comma problem in the spec field (removed it)

This error from the cvat_server docker file logs was caused by a trailing comma-

json.decoder.JSONDecodeError: Expecting value: line 5 column 1 (char 106)

Thank goodness for your note on this @Errin890 . Here is troublesome code snippet.

metadata:
  name: 3c-yolov8
  namespace: cvat
  annotations:
    name: 3c-yolov8
    type: detector
    framework: pytorch
    spec: |
      [
        { "id": 0, "name": c1"},
        { "id": 1, "name": "c2"},
        { "id": 2, "name": "c3"},
      ]

Fixed by removing the comma on { "id": 2, "name": "c3"},