api/ml/{id}/train does not trigger training

themantalope commented 2 years ago

Describe the bug Per the documentation, it seems that the api/ml/{id}/train should trigger the training process on the ML backend, which should trigger the fit function of the LabelStudioMLBase model. However, when running this command, even via curl there is no response from the label-studio server nor does the fit method get triggered.

Here is the current version of the docker-compose.yml file for my project:

version: "3.8"

services:
  redis:
    image: redis:alpine
    container_name: redis
    hostname: redis
    volumes:
      - "./data/redis:/data"
    expose:
      - 6379
  labeling:
    container_name: labeling_container
    image: heartexlabs/label-studio:v1.5.0
    ports: 
      - 8080:8080
    depends_on:
      - modeling
    volumes: 
      - ./data:/label-studio/data
    environment:
      - LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true 
      - LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data/media
    command: > 
      bash -c "
      label-studio start  
      --log-level DEBUG
      --sampling prediction-score-min 
      --ml-backends http://modeling_container:9090"
    restart: always
  modeling:
    container_name: modeling_container
    build: 
      context: ./modeling
    command: >
      bash -c "
      label-studio-ml init modeling_backend 
      --script tools/${MODEL:-model.py}
      --force true
      &&
      label-studio-ml start ./modeling_backend 
      --port 9090
      --debug "
    restart: always
    volumes: 
      - ./data/media:/data/
    environment:
      - MODEL_DIR=/data/models
      - RQ_QUEUE_NAME=default
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - USE_REDIS=true
    ports:
      - 9090:9090
    depends_on:
      - redis
    links:
      - redis

Here is my model.py file for the ML backend.

from importlib.resources import path
import torch
import torch.nn as nn
import torch.optim as optim
import time
import os
import numpy as np
import requests
import io
import hashlib
import urllib
import cv2
import pathlib
import urllib.parse as urlparse
from skimage import io, color

from PIL import Image
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms

from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.utils import get_single_tag_keys, get_choice, is_skipped

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

import layoutparser as lp

image_cache_dir = os.path.join(os.path.dirname(__file__), 'image-cache')
os.makedirs(image_cache_dir, exist_ok=True)

def load_image_from_url(url):
    # is_local_file = url.startswith('http://localhost:') and '/data/' in url
    # purl = pathlib.Path(url)
    pres = urlparse.urlparse(url)
    if pres.scheme == '':
        purl = pathlib.Path(url)
        url = purl.as_uri()

    im = io.imread(url)
    if len(im.shape) < 3:
        # needs to be converted to rgb
        im = color.gray2rgb(im)
    return im

def convert_block_to_value(block, image_height, image_width):

    return  {
            "height": block.height / image_height*100,
            "choices": [str(block.type)],
            "rotation": 0,
            "width":  block.width / image_width*100,
            "x":      block.coordinates[0] / image_width*100,
            "y":      block.coordinates[1] / image_height*100,
            "score":  block.score
        }

class ObjectDetectionAPI(LabelStudioMLBase):

    def __init__(self, freeze_extractor=False, **kwargs):

        super(ObjectDetectionAPI, self).__init__(**kwargs)

        # label_map_list = os.environ['LABEL_MAP'].split()
        # {int(label_map_list[i]): str(label_map_list[i+1]) for i in range(0, len(label_map_list), 2)}

        print('parsed label config:\n ')
        print(self.parsed_label_config)

        self.from_name, self.to_name, self.value, self.classes =\
            get_single_tag_keys(self.parsed_label_config, 'RectangleLabels', 'Image')
        self.freeze_extractor = freeze_extractor

        self.model = lp.Detectron2LayoutModel(
            config_path = 'lp://detectron2/PrimaLayout/mask_rcnn_R_50_FPN_3x/config',
            # model_path  = 'https://www.dropbox.com/s/bitxe8occzb865u/model_final.pth?dl=1',
            ### PLEASE REMEMBER TO CHANGE `dl=0` INTO `dl=1` IN THE END 
            ### OF DROPBOX LINKS 
            extra_config=["MODEL.ROI_HEADS.NMS_THRESH_TEST", 0.2,
                          "MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
            label_map={0: "text"}
        )

    def reset_model(self):
        # self.model = ImageClassifier(len(self.classes), self.freeze_extractor)
        pass

    def predict(self, tasks, **kwargs):

        # print('tasks: ', tasks)
        print(kwargs)
        print('self.value: ', self.value)

        image_urls = [task['data'][self.value] for task in tasks]
        print('image urls: ', image_urls)
        images = [load_image_from_url(url) for url in image_urls]
        print('im sizes: ', [im.shape for im in images])
        layouts = [self.model.detect(image) for image in images]  
        print('label config: ', self.parsed_label_config)
        print('layouts: ', layouts)
        predictions = []
        for image, layout in zip(images, layouts):
            height, width = image.shape[:2]

            result = [
                {
                'from_name': self.from_name,
                'to_name': self.to_name,
                "original_height": height,
                "original_width": width,
                "source": "$image",
                'type': 'rectanglelabels',
                "value": convert_block_to_value(block, height, width),
                } for block in layout
            ]

            predictions.append({'result': result})

        return predictions

    def fit(self, tasks, workdir=None, 
            batch_size=32, num_epochs=10, **kwargs):
        print("now running the fit function....")
        image_urls, image_classes = [], []
        # print('Collecting completions...')
        # for completion in completions:
        #     if is_skipped(completion):
        #         continue
        #     image_urls.append(completion['data'][self.value])
        #     image_classes.append(get_choice(completion))
        print('tasks: ', tasks)

        print('image urls: ', image_urls)
        print('image classes: ', image_classes)

        # print('Creating dataset...')
        # dataset = ImageClassifierDataset(image_urls, image_classes)
        # dataloader = DataLoader(dataset, shuffle=True, batch_size=batch_size)

        # print('Train model...')
        # # self.reset_model()
        # self.model.train(dataloader, num_epochs=num_epochs)

        # print('Save model...')
        # model_path = os.path.join(workdir, 'model.pt')
        # self.model.save(model_path)

        return {'model_path': None, 'classes': None}

Right now, there isn't much in the fit function, I just wanted to make sure it was working however nothing gets printed to the logs of the modeling_container.

To Reproduce Steps to reproduce the behavior:

Log in to http://localhost:8080
Create a new project (test)
Add data and configuration. In my case I'm using rectangular bounding boxes.
Add the ML backend in settings. Will need to use http://modeling_container:9090 since all containers are on the same docker-compose network.
Add data/annotations
The auto-predictions in the case do indeed work, triggering the predict function specified in model.py
Go to Settings->Machine Learning and click Start Training on the connected ML backend
curl -X POST http://localhost:8080/api/ml/{id}/train -H 'Authorization: Token <token>' also does nothing.

Expected behavior Code in the fit function should trigger when the curl command is launched or "Start Training" button is clicked.

Screenshots Can provide if needed.

Environment (please complete the following information):

OS: Ubuntu 18.04 running docker 20.10.17, build 100c701 and docker-compose v 1.29.1, build c34c88b
Label Studio Version 1.5.0

Additional context It's entirely possible that I'm not configuring the project correctly, so please let me know.

KonstantinKorotaev commented 2 years ago

Hi @themantalope Could you please tell me your label-studio-ml-backend version?

themantalope commented 2 years ago

@KonstantinKorotaev

Thanks for getting back to me. I'm using the current version, installed via pip install git+https://github.com/heartexlabs/label-studio-ml-backend. The current version used is 1.0.7.

I should also clarify. When using the curl command in step 8, I also do not get any response from the server.

KonstantinKorotaev commented 2 years ago

Do you have any logs from Label Studio and from ML backend?

themantalope commented 2 years ago

@KonstantinKorotaev

Here are the logs after clicking the "Start Training" button 3 times.

Congratulations! ML Backend has been successfully initialized in ./modeling_backend

Now start it by using:

label-studio-ml start ./modeling_backend

 * Serving Flask app "label_studio_ml.api" (lazy loading)

 * Environment: production

   WARNING: This is a development server. Do not use it in a production deployment.

   Use a production WSGI server instead.

 * Debug mode: on

[2022-06-29 01:54:08,277] [WARNING] [werkzeug::_log::225]  * Running on all addresses.

   WARNING: This is a development server. Do not use it in a production deployment.

[2022-06-29 01:54:08,277] [INFO] [werkzeug::_log::225]  * Running on http://192.168.128.2:9090/ (Press CTRL+C to quit)

[2022-06-29 01:54:08,278] [INFO] [werkzeug::_log::225]  * Restarting with stat

[2022-06-29 01:54:09,476] [WARNING] [werkzeug::_log::225]  * Debugger is active!

[2022-06-29 01:54:09,477] [INFO] [werkzeug::_log::225]  * Debugger PIN: 133-986-258

Congratulations! ML Backend has been successfully initialized in ./modeling_backend

Now start it by using:

label-studio-ml start ./modeling_backend

 * Serving Flask app "label_studio_ml.api" (lazy loading)

 * Environment: production

   WARNING: This is a development server. Do not use it in a production deployment.

   Use a production WSGI server instead.

 * Debug mode: on

[2022-06-29 15:00:29,793] [WARNING] [werkzeug::_log::225]  * Running on all addresses.

   WARNING: This is a development server. Do not use it in a production deployment.

[2022-06-29 15:00:29,794] [INFO] [werkzeug::_log::225]  * Running on http://192.168.128.2:9090/ (Press CTRL+C to quit)

[2022-06-29 15:00:29,795] [INFO] [werkzeug::_log::225]  * Restarting with stat

[2022-06-29 15:00:31,030] [WARNING] [werkzeug::_log::225]  * Debugger is active!

[2022-06-29 15:00:31,032] [INFO] [werkzeug::_log::225]  * Debugger PIN: 206-914-877

[2022-06-29 15:17:58,455] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:17:58] "GET /health HTTP/1.1" 200 -

parsed label config:

{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}

[2022-06-29 15:17:59,751] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:17:59] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:09,622] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:09,652] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:09,721] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:09,731] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:09,805] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:09,822] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:09,907] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:09,916] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:09,980] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:09,987] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:21,867] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:21] "POST /webhook HTTP/1.1" 201 -

[2022-06-29 15:18:21,895] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:21] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:21,903] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:21] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:23,139] [ERROR] [label_studio_ml.model::get_result::58] 

Traceback (most recent call last):

  File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 56, in get_result

    job_result = self.get_result_from_job_id(model_version)

  File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 110, in get_result_from_job_id

    assert isinstance(result, dict)

AssertionError

parsed label config:

{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}

now running the fit function....

tasks:  ()

image urls:  []

image classes:  []

[2022-06-29 15:18:36,306] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:36] "POST /webhook HTTP/1.1" 201 -

[2022-06-29 15:18:36,335] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:36] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:18:36,342] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:36] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:18:37,494] [ERROR] [label_studio_ml.model::get_result::58] 

Traceback (most recent call last):

  File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 56, in get_result

    job_result = self.get_result_from_job_id(model_version)

  File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 110, in get_result_from_job_id

    assert isinstance(result, dict)

AssertionError

parsed label config:

{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}

now running the fit function....

tasks:  ()

image urls:  []

image classes:  []

Sorry about that! I saw the error but forgot to include it in the initial description of the issue.

themantalope commented 2 years ago

Also, looks like the fit function is actually getting triggered but it's not getting any tasks...

themantalope commented 2 years ago

This could be due to the way I've set up the label-studio and label-studio-ml containers. I'm running all of them from a single docker-compose.yml file (specified in the problem description). When I restart the stack using docker-compose down; docker-compose up --build, the logs for the modeling_container are now showing a different output after clicking the "Start Training" button:

Congratulations! ML Backend has been successfully initialized in ./modeling_backend

Now start it by using:

label-studio-ml start ./modeling_backend

 * Serving Flask app "label_studio_ml.api" (lazy loading)

 * Environment: production

   WARNING: This is a development server. Do not use it in a production deployment.

   Use a production WSGI server instead.

 * Debug mode: on

[2022-06-29 15:32:33,354] [WARNING] [werkzeug::_log::225]  * Running on all addresses.

   WARNING: This is a development server. Do not use it in a production deployment.

[2022-06-29 15:32:33,355] [INFO] [werkzeug::_log::225]  * Running on http://172.19.0.3:9090/ (Press CTRL+C to quit)

[2022-06-29 15:32:33,355] [INFO] [werkzeug::_log::225]  * Restarting with stat

[2022-06-29 15:32:34,563] [WARNING] [werkzeug::_log::225]  * Debugger is active!

[2022-06-29 15:32:34,564] [INFO] [werkzeug::_log::225]  * Debugger PIN: 941-357-599

[2022-06-29 15:32:50,161] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:50] "GET /health HTTP/1.1" 200 -

parsed label config:

{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}

config.yaml?dl=1: 0.00B [00:00, ?B/s]
config.yaml?dl=1: 8.19kB [00:01, 5.55kB/s]
config.yaml?dl=1: 8.19kB [00:01, 5.54kB/s]

[2022-06-29 15:32:53,543] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:53] "GET /health HTTP/1.1" 200 -

parsed label config:

{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}

[2022-06-29 15:32:59,047] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:59] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:32:59,595] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:59] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:33:55,655] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:33:55,664] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:33:55,681] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:33:55,689] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:33:55,697] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:33:55,702] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:33:55,735] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:33:55,747] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:33:58,813] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:58] "POST /webhook HTTP/1.1" 201 -

[2022-06-29 15:33:58,843] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:58] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:33:58,853] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:58] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:34:19,886] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:19] "POST /webhook HTTP/1.1" 201 -

[2022-06-29 15:34:19,911] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:19] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:34:19,920] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:19] "POST /setup HTTP/1.1" 200 -

[2022-06-29 15:34:22,335] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:22] "POST /webhook HTTP/1.1" 201 -

[2022-06-29 15:34:22,361] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:22] "GET /health HTTP/1.1" 200 -

[2022-06-29 15:34:22,372] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:22] "POST /setup HTTP/1.1" 200 -

EDIT:

The logs here do not show the fit function getting triggered. I wonder if this is because of label-studio-ml using it's own docker-compose network?

KonstantinKorotaev commented 2 years ago

The logs here do not show the fit function getting triggered. I wonder if this is because of label-studio-ml using it's own docker-compose network?

The log has webhook calls, please check this guide.

Also, looks like the fit function is actually getting triggered but it's not getting any tasks...

Check the guide about training with webhooks. Here is the example how you can get annotated dataset.

themantalope commented 2 years ago

@KonstantinKorotaev

Thank you for the clarification. A webhook is also called when the user submits a POST to the api/ml/{id}/train endpoint. Is there a way to modify the webhook that is triggered when the user clicks the "Start Training" button, or if the submit a POST request to the api/ml/{id}/train endpoint? The options in the label studio webhook editing page are limited to events related to the creation/update/delete of tasks, annotations etc.

KonstantinKorotaev commented 2 years ago

Is there a way to modify the webhook that is triggered when the user clicks the "Start Training" button, or if the submit a POST request to the api/ml/{id}/train endpoint?

What do you want to add there?

HumanSignal / label-studio

api/ml/{id}/train does not trigger training #2578