Incorrect Label Assignment for Multiple Objects in Label Studio ML Backend

Buckler89 commented 1 week ago

Description:

I discovered the problem when using Label Studio ML Backend to label a video with multiple objects to track. This issue occurs consistently with all videos that contain multiple objects, regardless of their complexity or duration. All objects in a task receive the same label in the user interface, regardless of the distinct labels assigned by the model.

Key Issue: The core problem is that when updating annotations via the model's response, the Label Studio interface is expected to display distinct labels for each object. However, it ends up displaying identical labels for all objects, even if the labels are supposed to be different. This happens both when creating new annotations and updating existing ones.

Request:

Ensure that when multiple objects are present in the predictions, all of them are correctly displayed in the UI with their respective labels.

Steps to Reproduce:

Start by using a label config template for video object tracking: https://labelstud.io/templates/video_object_detector.

Configure Label Studio ML Backend and return the following dummy prediction. A 'dummy prediction' in this context refers to a sample output generated by the ML backend to simulate how predictions would look for multiple objects being tracked. This example demonstrates how the ML backend is expected to return prediction data for multiple tracked objects. The goal is to ensure that each object is uniquely labeled, which aligns with the intended workflow of providing distinct labels for different objects in the Label Studio UI:

def predict(...):
   import json
   prediction = json.dumps({
       "model_version": None,
       "score": 0.0,
       "result": [
           {
               "value": {
                   "framesCount": 43,
                   "duration": 4.3,
                   "sequence": [
                       {
                           "frame": 1,
                           "x": 51.42,
                           "y": 50.35,
                           "width": 3.41,
                           "height": 1.39,
                           "enabled": True,
                           "rotation": 0,
                           "time": 0.0
                       },
                       {
                           "frame": 2,
                           "x": 50.43,
                           "y": 50.35,
                           "width": 3.41,
                           "height": 1.39,
                           "enabled": True,
                           "rotation": 0,
                           "time": 0.1
                       }
                   ],
                   "labels": [
                       "Woman"
                   ]
               },
               "from_name": "box",
               "to_name": "video",
               "type": "videorectangle",
               "origin": "manual",
               "id": "mGjkBhvZpv"
           },
           {
               "value": {
                   "framesCount": 43,
                   "duration": 4.3,
                   "sequence": [
                       {
                           "frame": 1,
                           "x": 0.0,
                           "y": 50.35,
                           "width": 78.98,
                           "height": 49.31,
                           "enabled": True,
                           "rotation": 0,
                           "time": 0.0
                       },
                       {
                           "frame": 2,
                           "x": 0.0,
                           "y": 50.87,
                           "width": 78.55,
                           "height": 48.78,
                           "enabled": True,
                           "rotation": 0,
                           "time": 0.1
                       }
                   ],
                   "labels": [
                       "Man"
                   ]
               },
               "from_name": "box",
               "to_name": "video",
               "type": "videorectangle",
               "origin": "manual",
               "id": "dssMyWHsv_"
           }
       ]
   })
   return ModelResponse(predictions=[json.loads(prediction)])

Observe that all objects in the UI are incorrectly assigned the label of the first object from the predictions, instead of displaying their respective unique labels as provided by the model.

makseq commented 1 week ago

Sounds a bit weird.. You can check this yolo video detection as reference: https://github.com/HumanSignal/label-studio-ml-backend/blob/master/label_studio_ml/examples/yolo/control_models/video_rectangle.py#L88-L126

Last time I tested it, this worked as expected.

Buckler89 commented 1 week ago

Hi Max,

I tested YOLO based on your suggestion. I can confirm the following:

It works only if no boxes are present before triggering the model's prediction. If the model is called on task loading (when you select the task, it opens, and the prediction starts automatically without any intervention), it works as expected.
However, if you manually label an object and then click on the class label to trigger the prediction, when it returns, all objects are displayed with the same label, even though the ML backend is functioning correctly.

This seems to work for models like YOLO that ignore user input, but it does not work at all for models like SAM2, which require one or more prompts as initial input.

Additionally, even for YOLO, it doesn't work if the user adds boxes that the model cannot detect on its own, as the problem reoccurs.

To reproduce with YOLO:

Disable auto-annotation
Delete all video annotations if present
Enable auto-annotation
Draw a box and assign a label
Wait for the model to finish prediction
Verify that all labels are identical

HumanSignal / label-studio-ml-backend

Incorrect Label Assignment for Multiple Objects in Label Studio ML Backend #664