HumanSignal / label-studio-ml-backend

Configs and boilerplates for Label Studio's Machine Learning backend
Apache License 2.0
585 stars 261 forks source link

Incorrect Label Assignment for Multiple Objects in Label Studio ML Backend #664

Open Buckler89 opened 1 week ago

Buckler89 commented 1 week ago

Description:

I discovered the problem when using Label Studio ML Backend to label a video with multiple objects to track. This issue occurs consistently with all videos that contain multiple objects, regardless of their complexity or duration. All objects in a task receive the same label in the user interface, regardless of the distinct labels assigned by the model.

Key Issue: The core problem is that when updating annotations via the model's response, the Label Studio interface is expected to display distinct labels for each object. However, it ends up displaying identical labels for all objects, even if the labels are supposed to be different. This happens both when creating new annotations and updating existing ones.

Request:

Steps to Reproduce:

  1. Start by using a label config template for video object tracking: https://labelstud.io/templates/video_object_detector.

  2. Configure Label Studio ML Backend and return the following dummy prediction. A 'dummy prediction' in this context refers to a sample output generated by the ML backend to simulate how predictions would look for multiple objects being tracked. This example demonstrates how the ML backend is expected to return prediction data for multiple tracked objects. The goal is to ensure that each object is uniquely labeled, which aligns with the intended workflow of providing distinct labels for different objects in the Label Studio UI:

    def predict(...):
       import json
       prediction = json.dumps({
           "model_version": None,
           "score": 0.0,
           "result": [
               {
                   "value": {
                       "framesCount": 43,
                       "duration": 4.3,
                       "sequence": [
                           {
                               "frame": 1,
                               "x": 51.42,
                               "y": 50.35,
                               "width": 3.41,
                               "height": 1.39,
                               "enabled": True,
                               "rotation": 0,
                               "time": 0.0
                           },
                           {
                               "frame": 2,
                               "x": 50.43,
                               "y": 50.35,
                               "width": 3.41,
                               "height": 1.39,
                               "enabled": True,
                               "rotation": 0,
                               "time": 0.1
                           }
                       ],
                       "labels": [
                           "Woman"
                       ]
                   },
                   "from_name": "box",
                   "to_name": "video",
                   "type": "videorectangle",
                   "origin": "manual",
                   "id": "mGjkBhvZpv"
               },
               {
                   "value": {
                       "framesCount": 43,
                       "duration": 4.3,
                       "sequence": [
                           {
                               "frame": 1,
                               "x": 0.0,
                               "y": 50.35,
                               "width": 78.98,
                               "height": 49.31,
                               "enabled": True,
                               "rotation": 0,
                               "time": 0.0
                           },
                           {
                               "frame": 2,
                               "x": 0.0,
                               "y": 50.87,
                               "width": 78.55,
                               "height": 48.78,
                               "enabled": True,
                               "rotation": 0,
                               "time": 0.1
                           }
                       ],
                       "labels": [
                           "Man"
                       ]
                   },
                   "from_name": "box",
                   "to_name": "video",
                   "type": "videorectangle",
                   "origin": "manual",
                   "id": "dssMyWHsv_"
               }
           ]
       })
       return ModelResponse(predictions=[json.loads(prediction)])
  3. Observe that all objects in the UI are incorrectly assigned the label of the first object from the predictions, instead of displaying their respective unique labels as provided by the model.

makseq commented 1 week ago

Sounds a bit weird.. You can check this yolo video detection as reference: https://github.com/HumanSignal/label-studio-ml-backend/blob/master/label_studio_ml/examples/yolo/control_models/video_rectangle.py#L88-L126

Last time I tested it, this worked as expected.

Buckler89 commented 1 week ago

Hi Max,

I tested YOLO based on your suggestion. I can confirm the following:

This seems to work for models like YOLO that ignore user input, but it does not work at all for models like SAM2, which require one or more prompts as initial input.

Additionally, even for YOLO, it doesn't work if the user adds boxes that the model cannot detect on its own, as the problem reoccurs.

To reproduce with YOLO:

  1. Disable auto-annotation
  2. Delete all video annotations if present
  3. Enable auto-annotation
  4. Draw a box and assign a label
  5. Wait for the model to finish prediction
  6. Verify that all labels are identical