cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.62k stars 3.01k forks source link

Proper serverless deployment of human pose estimation model #3756

Closed kevkid closed 1 year ago

kevkid commented 3 years ago

Hi I am trying to deploy a human pose estimation model, specifically mediapipe as it is easy to use. I followed the guide here but I have some lingering questions. Specifically is my annotations.type correct for this task. I don't know where the list of types is. Similarly I dont know if my main.py script is correct, pretty sure its right but I havent tested it. as right now it is stuck here:

$ nuctl deploy --project-name cvat   --path serverless/custom/nuclio   --volume `pwd`/serverless/common:/opt/nuclio/common   --platform local
21.09.30 15:55:03.682                     nuctl (I) Deploying function {"name": ""}
21.09.30 15:55:03.682                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.09.30 15:55:03.947                     nuctl (I) Cleaning up before deployment {"functionName": "mediapipe_pose_estimation"}
21.09.30 15:55:03.974                     nuctl (I) Staging files and preparing base images
21.09.30 15:55:03.974                     nuctl (I) Building processor image {"imageName": "cvat/mediapipe_pose_estimation:latest"}
21.09.30 15:55:03.974     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.09.30 15:55:05.010     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.09.30 15:55:06.558            nuctl.platform (I) Building docker image {"image": "cvat/mediapipe_pose_estimation:latest"}

Here is my function.yaml

# CONFIG
metadata:
  name: mediapipe_pose_estimation
  namespace: cvat
  annotations:
    name: mediapipe_pose_estimation
    type: detector
    framework: tensorflow
    spec: |
      [
        { "id":  0, "name": "nose" },
        { "id":  1, "name": "left_eye_inner" },
        { "id":  2, "name": "left_eye" },
        { "id":  3, "name": "left_eye_outer" },
        { "id":  4, "name": "right_eye_inner" },
        { "id":  5, "name": "right_eye" },
        { "id":  6, "name": "right_eye_outer" },
        { "id":  7, "name": "left_ear" },
        { "id":  8, "name": "right_ear" },
        { "id":  9, "name": "mouth_left" },
        { "id":  10, "name": "mouth_right" },
        { "id":  11, "name": "left_shoulder" },
        { "id":  12, "name": "right_shoulder" },
        { "id":  13, "name": "left_elbow" },
        { "id":  14, "name": "right_elbow" },
        { "id":  15, "name": "left_wrist" },
        { "id":  16, "name": "right_wrist" },
        { "id":  17, "name": "left_pinky" },
        { "id":  18, "name": "right_pinky" },
        { "id":  19, "name": "left_index" },
        { "id":  20, "name": "right_index" },
        { "id":  21, "name": "left_thumb" },
        { "id":  22, "name": "right_thumb" },
        { "id":  23, "name": "left_hip" },
        { "id":  24, "name": "right_hip" },
        { "id":  25, "name": "left_knee" },
        { "id":  26, "name": "right_knee" },
        { "id":  27, "name": "left_ankle" },
        { "id":  28, "name": "right_ankle" },
        { "id":  29, "name": "left_heel" },
        { "id":  30, "name": "right_heel" },
        { "id":  31, "name": "left_foot_index" },
        { "id":  32, "name": "right_foot_index" }
      ]

spec:
  description: mediapipe_pose_estimation
  runtime: 'python:3.9'
  handler: main:handler
  eventTimeout: 30s

  build:
    image: cvat/mediapipe_pose_estimation
    baseImage: ubuntu:21.04

    directives:
      preCopy:
        - kind: RUN
          value: apt-get update && apt-get -y install curl python3 python3-pip
        - kind: RUN
          value: pip3 install mediapipe opencv-python numpy matplotlib
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: ln -s /usr/bin/pip3 /usr/local/bin/pip

  triggers:
    myHttpTrigger:
      maxWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3
      mountMode: volume

main.py:

import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
def init_context(context):
    mp_drawing = mp.solutions.drawing_utils
    mp_drawing_styles = mp.solutions.drawing_styles
    mp_pose = mp.solutions.pose
    predictor = mp_pose.Pose(
        min_detection_confidence=0.5,
        min_tracking_confidence=0.5)

    setattr(context.user_data, 'model_handler', predictor)
    functionconfig = yaml.safe_load(open("/opt/nuclio/function.yaml"))
    labels_spec = functionconfig['metadata']['annotations']['spec']
    labels = {item['id']: item['name'] for item in json.loads(labels_spec)}
    setattr(context.user_data, "labels", labels)

def handler(context, event):
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"].encode('utf-8')))
    threshold = float(data.get("threshold", 0.5))
    image = convert_PIL_to_numpy(Image.open(buf), format="BGR")

    #results.pose_landmarks.landmark
    predictions = context.user_data.model_handler(image)

    keypoints = ['nose', 'left_eye_inner', 'left_eye', 'left_eye_outer', 'right_eye_inner', 
             'right_eye', 'right_eye_outer', 'left_ear', 'right_ear', 'mouth_left', 
             'mouth_right', 'left_shoulder', 'right_shoulder ', 'left_elbow', 'right_elbow', 
             'left_wrist', 'right_wrist', 'left_pinky', 'right_pinky', 'left_index', 'right_index', 
             'left_thumb', 'right_thumb ', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 
             'left_ankle', 'right_ankle ', 'left_heel', 'right_heel', 'left_foot_index', 'right_foot_index']

    results = []
    results.append({

                "label": keypoints,
                "points": [[x.x, x.y, x.z, x.visibility] for x in results.pose_landmarks.landmark],
                "type": "point",
            })

    return context.Response(body=json.dumps(results), headers={},
        content_type='application/json', status_code=200)
kevkid commented 3 years ago

Okay I got it to deploy but now I am having an issue when trying to annotate. I click on autoannotate and it gives me: Error: Inference status for the task 1 is failed. requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://nuclio:8070/api/function_invocations

cryptic but I checked the nuclio dashboard and can interact with the function. When I try to test it (on the right the test panel is there) it allows me to enter a string. Entering anything (including a b64 encoded string) it gives me:

Exception caught in handler - "byte indices must be integers or slices, not str": Traceback (most recent call last):
  File "/opt/nuclio/_nuclio_wrapper.py", line 114, in serve_requests
    self._handle_event(event)
  File "/opt/nuclio/_nuclio_wrapper.py", line 262, in _handle_event
    entrypoint_output = self._entrypoint(self._context, event)
  File "/opt/nuclio/main.py", line 25, in handler
    print('hello')
TypeError: byte indices must be integers or slices, not str

any ideas on how to test this?

EDIT

Here is how to test:

import base64
import json                    

import requests
api = 'http://<URL_TO_FUNCTION>:49273/' # this is found on the nuclio dashboard
image_file = 'img.jpg'

with open(image_file, "rb") as f:
    im_bytes = f.read()        
im_b64 = base64.b64encode(im_bytes).decode("utf8")

headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

payload = json.dumps({"image": im_b64, "other_key": "value"})
response = requests.post(api, data=payload, headers=headers)
try:
    data = response.json()     
    print(data)                
except requests.exceptions.RequestException:
    print(response.text)

Here is my new code:

import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
import json
import base64
import io
from PIL import Image
import yaml
import numpy as np
def init_context(context):
    mp_drawing = mp.solutions.drawing_utils
    mp_drawing_styles = mp.solutions.drawing_styles
    mp_pose = mp.solutions.pose
    predictor = mp_pose.Pose(
        min_detection_confidence=0.5,
        min_tracking_confidence=0.5)

    setattr(context.user_data, 'model_handler', predictor.process)
    functionconfig = yaml.safe_load(open("/opt/nuclio/function.yaml"))
    labels_spec = functionconfig['metadata']['annotations']['spec']
    labels = {item['id']: item['name'] for item in json.loads(labels_spec)}
    setattr(context.user_data, "labels", labels)

def handler(context, event):
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"].encode('utf-8')))
    image = np.asarray(Image.open(buf))

    #results.pose_landmarks.landmark
    predictions = context.user_data.model_handler(image)

    keypoints = ['nose', 'left_eye_inner', 'left_eye', 'left_eye_outer', 'right_eye_inner', 
             'right_eye', 'right_eye_outer', 'left_ear', 'right_ear', 'mouth_left', 
             'mouth_right', 'left_shoulder', 'right_shoulder ', 'left_elbow', 'right_elbow', 
             'left_wrist', 'right_wrist', 'left_pinky', 'right_pinky', 'left_index', 'right_index', 
             'left_thumb', 'right_thumb ', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 
             'left_ankle', 'right_ankle ', 'left_heel', 'right_heel', 'left_foot_index', 'right_foot_index']

    results = []
    results.append({

                "label": keypoints,
                "points": [[x.x, x.y, x.z, x.visibility] for x in predictions.pose_landmarks.landmark],
                "type": "point",
            })

    return context.Response(body=json.dumps(results), headers={},
        content_type='application/json', status_code=200)

EDIT 2:

I am shooting in the dark here but I have no idea what the results are supposed to look like for keypoints?

here is what I have:

results.append({

                "label": keypoints,
                "points": [[x.x, x.y, x.z, x.visibility] for x in predictions.pose_landmarks.landmark],
                "type": "point",
            })

when trying to auto annotate it gives: Error: Inference status for the task 1 is failed. TypeError: unhashable type: 'list'

@nmanovic any ideas on how I can get this up? I saw you have tagged a similar enhancement.

nmanovic commented 3 years ago

@kevkid , I have to dig into the issue. For now I can say nothing. If you want to help, try to debug lambda app inside CVAT server. We don't have DL models for key points. Thus your way isn't smooth and you have all these problems.

kevkid commented 3 years ago

@nmanovic thank you for the reply. So this lambda app is what handles the imports of the points/labels? I'm curious if i can find where cvat processes the outputs (accepts the response from the models docker) of another model such as one of the object detection models.