cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.48k stars 2.99k forks source link

Trying to implement custom yolov8 segmentation model for automatic labeling on serverless. #7503

Closed UffeLauge closed 8 months ago

UffeLauge commented 8 months ago

Actions before raising this issue

Steps to Reproduce

  1. Installed CVAT serverless on windows 10 with WSL.
  2. Install Nuclio v. 1.11.24
  3. Created custom yolov8 model inspired by: https://github.com/opencv/cvat/pull/6491
  4. Deployed the model and model is running in Docker
  5. This error appears in CVAT, after deploying model: Screenshot 2024-02-21 085957
  6. When testing the model in Localhost:8070, it get this error: "error": "Failed to invoke function: Post 'http://0.0.0.0:59561': dial tcp 0.0.0.0:59561: connect: connection refused"

Expected Behavior

No response

Possible Solution

No response

Context

I am trying to use my custom trained model to auto annotate in CVAT Serverless. I have placed my model files in: cvat/serverless/pytorch/ultralytics/yolov8. In this folder i have placed: function.yaml, function_cpu.yaml, main.py and best.pt (the yolov8 weight file) main.py:

import json
import base64
from PIL import Image
import io
import torch
import numpy as np
from ultralytics import YOLO
import supervision as sv
from skimage.measure import approximate_polygon, find_contours

def to_cvat_mask(box: list, mask):
    xtl, ytl, xbr, ybr = box
    flattened = mask[ytl:ybr + 1, xtl:xbr + 1].flat[:].tolist()
    flattened.extend([xtl, ytl, xbr, ybr])
    return flattened

def init_context(context):
    context.logger.info("Init context...  0%")

    model_path = "/opt/nuclio/best.pt"

    model = YOLO(model_path, task="segment")

    # Read the DL model
    context.user_data.model = model

    context.logger.info("Init context...100%")

def handler(context, event):
    context.logger.info("Run yolo-v8 model")
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"]))
    threshold = float(data.get("threshold", 0.5))
    context.user_data.model.conf = threshold
    image = Image.open(buf)

    yolo_results = context.user_data.model(image, conf=threshold)[0]
    labels = yolo_results.names
    detections = sv.Detections.from_yolov8(yolo_results)
    detections = detections[detections.confidence > threshold]

    results = []
    if len(detections) > 0:

        for xyxy, mask, confidence, class_id, _ in detections:
            mask = mask.astype(np.uint8)

            xtl = int(xyxy[0])
            ytl = int(xyxy[1])
            xbr = int(xyxy[2])
            ybr = int(xyxy[3])

            label = int(class_id)
            cvat_mask = to_cvat_mask((xtl, ytl, xbr, ybr), mask)

            contours = find_contours(mask, 0.5)
            contour = contours[0]
            contour = np.flip(contour, axis=1)
            polygons = approximate_polygon(contour, tolerance=2.5)

            results.append({
                "confidence": str(confidence),
                "label": labels.get(class_id, "unknown"),
                "type": "mask",
                "points": polygons.ravel().tolist(),
                "mask": cvat_mask,
            })

    return context.Response(body=json.dumps(results), headers={},
                            content_type='application/json', status_code=200)

function.yaml:

 metadata:
  name: pth-qr-code-and-objects-yolov8-segment
  namespace: cvat
  annotations:
    name: YOLO v8 Segment
    type: detector
    framework: pytorch
    spec: |
      [
        { "id": 0, "name": "QR" },
        { "id": 1, "name": "object" },
      ]
spec:
  description: YOLO v8 Segment via pytorch
  runtime: 'python:3.9'
  handler: main:handler
  eventTimeout: 30s
  build:
    image: cvat.pth.ultralytics.yolov8.segment
    baseImage: ultralytics/ultralytics:latest-cpu

    directives:
      preCopy:
        - kind: USER
          value: root
        - kind: RUN
          value: apt update && apt install --no-install-recommends -y libglib2.0-0
        - kind: RUN
          value: pip install supervision ultralytics scikit-image 

Environment

No response

bsekachev commented 8 months ago

It does not look like your model was healthy deployed. I would suggest to check docker container and its logs.

UffeLauge commented 8 months ago

It does not look like your model was healthy deployed. I would suggest to check docker container and its logs.

Okay, I have no errors in the yolov8 docker logs. However I get the following errors in the cvat docker logs, do you have suggestions on what to change?

2024-02-21 10:38:52 traefik                        | {"level":"error","msg":"no valid entryPoint for this router","routerName":"grafana_https@file","time":"2024-02-21T09:38:52Z"}
2024-02-21 10:38:52 traefik                        | {"entryPointName":"websecure","level":"error","msg":"entryPoint \"websecure\" doesn't exist","routerName":"grafana_https@file","time":"2024-02-21T09:38:52Z"}
2024-02-21 10:38:52 traefik                        | {"level":"error","msg":"no valid entryPoint for this router","routerName":"grafana_https@file","time":"2024-02-21T09:38:52Z"}
2024-02-21 10:38:54 traefik                        | {"entryPointName":"websecure","level":"error","msg":"entryPoint \"websecure\" doesn't exist","routerName":"grafana_https@file","time":"2024-02-21T09:38:54Z"}
2024-02-21 10:38:54 traefik                        | {"level":"error","msg":"no valid entryPoint for this router","routerName":"grafana_https@file","time":"2024-02-21T09:38:54Z"}
2024-02-21 10:38:54 traefik                        | {"entryPointName":"websecure","level":"error","msg":"entryPoint \"websecure\" doesn't exist","routerName":"grafana_https@file","time":"2024-02-21T09:38:54Z"}
2024-02-21 10:38:54 traefik                        | {"level":"error","msg":"no valid entryPoint for this router","routerName":"grafana_https@file","time":"2024-02-21T09:38:54Z"}
2024-02-21 10:38:56 traefik                        | {"entryPointName":"websecure","level":"error","msg":"entryPoint \"websecure\" doesn't exist","routerName":"grafana_https@file","time":"2024-02-21T09:38:56Z"}
2024-02-21 10:38:56 traefik                        | {"level":"error","msg":"no valid entryPoint for this router","routerName":"grafana_https@file","time":"2024-02-21T09:38:56Z"}
2024-02-21 10:38:56 traefik                        | {"entryPointName":"websecure","level":"error","msg":"entryPoint \"websecure\" doesn't exist","routerName":"grafana_https@file","time":"2024-02-21T09:38:56Z"}
2024-02-21 10:38:52 cvat_vector                    | 2024-02-21T09:38:52.819587Z  WARN http: vector::internal_events::http_client: HTTP error. error=error trying to connect: tcp connect error: Connection refused (os error 111) error_type="request_failed" stage="processing" internal_log_rate_limit=true
2024-02-21 10:38:52 cvat_vector                    | 2024-02-21T09:38:52.819739Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=Failed to make HTTP(S) request: error trying to connect: tcp connect error: Connection refused (os error 111) component_kind="sink" component_type="clickhouse" component_id=clickhouse component_name=clickho
2024-02-21 10:39:08 cvat_server                    | nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (13: Permission denied)
2024-02-21 10:39:08 cvat_server                    | 
2024-02-21 10:39:08 cvat_server                    | 2024-02-21 09:39:08,754 DEBG 'smokescreen' stderr output:
2024-02-21 10:39:08 cvat_server                    | {"level":"info","msg":"starting","time":"2024-02-21T09:39:08Z"}
2024-02-21 10:39:08 cvat_server                    | 
2024-02-21 10:39:08 cvat_server                    | 2024-02-21 09:39:08,757 DEBG 'uvicorn-0' stderr output:
2024-02-21 10:39:08 cvat_server                    | wait-for-it.sh: waiting for cvat_db:5432 without a timeout
1:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-02-21T09:38:55Z"}
2024-02-21 10:38:55 cvat_opa                       | {"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.23.0.11:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-02-21T09:38:55Z"}
2024-02-21 10:38:55 cvat_opa                       | {"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.23.0.11:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-02-21T09:38:55Z"}
2024-02-21 10:38:55 cvat_opa                       | {"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.23.0.11:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-02-21T09:38:55Z"}
2024-02-21 10:38:55 cvat_opa                       | {"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.23.0.11:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-02-21T09:38:55Z"}
2024-02-21 10:49:08 cvat_grafana                   | logger=cleanup t=2024-02-21T09:49:08.361818152Z level=info msg="Completed cleanup jobs" duration=15.293741ms
2024-02-21 10:49:08 cvat_grafana                   | logger=sqlstore.transactions t=2024-02-21T09:49:08.369970581Z level=info msg="Database locked, sleeping then retrying" error="database is locked" retry=1 code="database is locked"
2024-02-21 10:49:08 cvat_grafana                   | logger=grafana.update.checker t=2024-02-21T09:49:08.436252928Z level=info msg="Update check succeeded" duration=37.765296ms
2024-02-21 10:49:08 cvat_grafana                   | logger=plugins.update.checker t=2024-02-21T09:49:08.520970967Z level=info msg="Update check succeeded" duration=81.64539ms
2024-02-21 10:54:06 cvat_grafana                   | logger=sqlstore.transactions t=2024-02-21T09:54:06.399300934Z level=info msg="Database locked, sleeping then retrying" error="database is locked" retry=0 code="database is locked"
bsekachev commented 8 months ago

You should start from checking this error:

nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (13: Permission denied)

Go to cvat container and check permissions for that file. You may login as root inside the container: docker exec -u root -it cvat_server bash

bsekachev commented 8 months ago

But I am not sure that is the reason. Somewhy cvat_opa can't get bundle from cvat_server container. However I do not see something else suspicious in provided cvat_server logs.

2024-02-21 10:38:55 cvat_opa | {"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.23.0.11:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-02-21T09:38:55Z"}

Refused usually means, that the server is not up on the specified port.

bsekachev commented 8 months ago

And finally I do not see any logs corresponding to this error: image

But with code 500 there should be some exceptions in the docker logs.

UffeLauge commented 8 months ago

You should start from checking this error:

nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (13: Permission denied)

Go to cvat container and check permissions for that file. You may login as root inside the container: docker exec -u root -it cvat_server bash

I have investigated the permissions, with the following result: image

UffeLauge commented 8 months ago

And finally I do not see any logs corresponding to this error: image

But with code 500 there should be some exceptions in the docker logs.

Sadly, I have not been able to find any exceptions in the logs.

bsekachev commented 8 months ago

I have investigated the permissions, with the following result:

Try to remove this file maybe, it should be re-created with correct permissions

UffeLauge commented 8 months ago

I have investigated the permissions, with the following result:

Try to remove this file maybe, it should be re-created with correct permissions

I tried deleting the file and restarting the CVAT Dockers, it just recreated the file with the same permissions.

bsekachev commented 8 months ago

And you still do not see any errors in docker logs cvat_server?

bsekachev commented 8 months ago

www-data and adm don't correspond to django user and django group, we use by default. So, I may conclude you are using a modified version of CVAT.

UffeLauge commented 8 months ago

www-data and adm don't correspond to django user and django group, we use by default. So, I may conclude you are using a modified version of CVAT.

That sound weird to me. All I have done is to follow your installation guide from: https://opencv.github.io/cvat/docs/administration/basics/installation/ And followed this guide afterwards: https://opencv.github.io/cvat/docs/administration/advanced/installation_automatic_annotation/

UffeLauge commented 8 months ago

And you still do not see any errors in docker logs cvat_server?

I still get the same errors as in my initial comment. Removing the /var/log/nginx/error.log didn't seem to have any effect.

bsekachev commented 8 months ago

Okay, never mind.

To suggest something, I need the full log from cvat_server, not just a short fragment.

khursani8 commented 8 months ago

Not sure this will help For me, i just chmod 777 for that file(bad practice) After that i can see the error better which is related to my serverless container After fix serverless code no issue anymore

This is just my assumption, the reason for new error log got different username probably because the server that got 500 error user is www-data(maybe)

felixkarevo commented 5 months ago

@bsekachev You marked this thread as completed. @UffeLauge Did you manage to solve the issue? I am trying to use a custom yolov8n-seg.pt model for auto annotation in CVAT. I installed nuclio and the serverless functions for CVAT. Segment anything (SAM) already works fine and I can use it in CVAT. My custom yolov8n-seg.pt is recognized by CVAT and I can select it for auto-annotation. When I click on task -> auto annotation it starts the process to auto-annotate. However, I get an error message. The model runs and detects something on interference but somehow the format does not work.

Error message (docker logs -f 1dcb9483a72a):

0: 2112x2496 39 Potatos, 1427.3ms Speed: 24.6ms preprocess, 1427.3ms inference, 931.1ms postprocess per image at shape (1, 3, 2112, 2496) 24.05.21 11:49:40.483 (E) sor.http.w0.python.logger Exception caught in handler {"worker_id": "0", "exc": "'tuple' object has no attribute 'xyxy'", "traceback": "Traceback (most recent call last):\n File \"/opt/nuclio/_nuclio_wrapper.py\", line 151, in serve_requests\n await self._handle_event(event)\n File \"/opt/nuclio/_nuclio_wrapper.py\", line 439, in _handle_event\n entrypoint_output = self._entrypoint(self._context, event)\n File \"/opt/nuclio/main.py\", line 50, in handler\n xyxy = detection.xyxy\nAttributeError: 'tuple' object has no attribute 'xyxy'\n"}

This is my "main.py" script:

` import json import base64 from PIL import Image import io

import numpy as np from ultralytics import YOLO import supervision as sv from skimage.measure import approximate_polygon, find_contours

def to_cvat_mask(box: list, mask): xtl, ytl, xbr, ybr = box flattened = mask[ytl:ybr + 1, xtl:xbr + 1].flat[:].tolist() flattened.extend([xtl, ytl, xbr, ybr]) return flattened

def init_context(context): context.logger.info("Init context... 0%")

model_path = "yolov8n-seg.pt"

model = YOLO(model_path, task="segment")

# Read the DL model
context.user_data.model = model

context.logger.info("Init context...100%")

def handler(context, event): context.logger.info("Run yolo-v8 model") data = event.body buf = io.BytesIO(base64.b64decode(data["image"])) threshold = float(data.get("threshold", 0.5)) context.user_data.model.conf = threshold image = Image.open(buf)

yolo_results = context.user_data.model(image, conf=threshold)[0]
labels = yolo_results.names
detections:sv.Detections = sv.Detections.from_ultralytics(yolo_results)

detections = detections[detections.confidence > threshold]

results = []
if len(detections) > 0:

    for detection in detections:
        xyxy = detection.xyxy
        mask = detection.mask
        confidence = detection.confidence
        class_id = detection.class_ide

        mask = mask.astype(np.uint8)

        xtl = int(xyxy[0])
        ytl = int(xyxy[1])
        xbr = int(xyxy[2])
        ybr = int(xyxy[3])

        label = int(class_id)
        cvat_mask = to_cvat_mask((xtl, ytl, xbr, ybr), mask)

        contours = find_contours(mask, 0.5)
        contour = contours[0]
        contour = np.flip(contour, axis=1)
        polygons = approximate_polygon(contour, tolerance=2.5)

        results.append({
                "confidence": str(confidence),
                "label": labels.get(class_id, "unknown"),
                "type": "mask",
                "points": polygons.ravel().tolist(),
                "mask": cvat_mask,
        })

return context.Response(body=json.dumps(results), headers={},
                        content_type='application/json', status_code=200)

`

felixkarevo commented 5 months ago

Solved it myself. The issue was that I did not handle the Detections object correctly. I was treating detection as if it has the xyxy attribute directly, but detection is actually a tuple.

This is the corrected handler function:

def handler(context, event):
    context.logger.info("Run yolo-v8 model")
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"]))
    threshold = float(data.get("threshold", 0.5))
    context.user_data.model.conf = threshold
    image = Image.open(buf)

    yolo_results = context.user_data.model(image, conf=threshold)[0]
    labels = yolo_results.names
    detections = sv.Detections.from_ultralytics(yolo_results)
    detections = detections[detections.confidence > threshold]

    results = []
    if len(detections) > 0:
        for i in range(len(detections)):
            xyxy = detections.xyxy[i]
            mask = detections.mask[i]
            confidence = detections.confidence[i]
            class_id = detections.class_id[i]

            mask = mask.astype(np.uint8)

            xtl = int(xyxy[0])
            ytl = int(xyxy[1])
            xbr = int(xyxy[2])
            ybr = int(xyxy[3])

            label = int(class_id)
            cvat_mask = to_cvat_mask((xtl, ytl, xbr, ybr), mask)

            contours = find_contours(mask, 0.5)
            contour = contours[0]
            contour = np.flip(contour, axis=1)
            polygons = approximate_polygon(contour, tolerance=2.5)

            results.append({
                "confidence": str(confidence),
                "label": labels.get(class_id, "unknown"),
                "type": "mask",
                "points": polygons.ravel().tolist(),
                "mask": cvat_mask,
            })

    return context.Response(body=json.dumps(results), headers={},
                            content_type='application/json', status_code=200)
felixkarevo commented 5 months ago

I created a repository: https://github.com/felixkarevo/CVAT-custom-yolov8-segmentation-auto-annotation