HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.1k stars 2.26k forks source link

Coordinates of bounding boxes are multiplied by 100 #2702

Closed sebastianohl closed 1 year ago

sebastianohl commented 2 years ago

Describe the bug After labeling images with bounding boxes the coordinates of some tasks are multiplied by 100. The initial proposals of the bboxs are created by an own ml-backend service. These proposals are approved by a human and maybe adapted. However, about 5% of the tasks have bbox coordinates (incl. dimensions) multiplied by 100.

To Reproduce Steps to reproduce the behavior are unknown. I am not aware that I did something different on the bad labeling tasks compared to the good ones. The only thing that I can think of is that I skiped loading the actual image (my homeoffice connection is not that fast) by pressing submit before the image was displayed if the proposed bboxes from the backend looked ok.

Expected behavior All tasks should store their bboxes in the same format so the export produces valid data.

Screenshots A good label (copied from labelmaker's internal data) { "original_width": 1952, "original_height": 1232, "image_rotation": 0, "value": { "x": 25.614754098360653, "y": 61.282467532467535, "width": 5.174180327868853, "height": 8.928571428571429, "rotation": 0, "rectanglelabels": [ "marker_08" ] }, "id": "77LgMBPpEz", "from_name": "label", "to_name": "image", "type": "rectanglelabels", "origin": "prediction", "score": 1 },

A bad label: { "original_width": 1, "original_height": 1, "image_rotation": 0, "value": { "x": 2556.3524590163934, "y": 6136.363636363637, "width": 517.4180327868853, "height": 884.7402597402597, "rotation": 0, "rectanglelabels": [ "marker_08" ] }, "id": "HIxRIzDccX", "from_name": "label", "to_name": "image", "type": "rectanglelabels", "origin": "prediction", "score": 1 },

Environment (please complete the following information):

hlomzik commented 2 years ago

Hi, Sebastian! Yes, skipping the image load is the problem, currently all regions initialization requires fully loaded image. But multiplied dimensions is some weird bug anyway, we'll check it and fix.

makseq commented 1 year ago

This issue was fixed in the latest releases (>1.6). This script can fix problems with old annotations:

cd label-studio/label_studio 
source env/bin/activate
python fix_outside_bbox.py

fix_outside_bbox.py:

import os
import time
import django
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "htx.settings.label_studio")
django.setup()

import logging

from tasks.models import Annotation, Task
from projects.models import Project

logger = logging.getLogger(__name__)

def check_range(v, mini, maxi):
  score =  mini < v['x'] <= maxi
  score += mini < v['y'] <= maxi
  score += mini < v['width'] <= maxi
  score += mini < v['height'] <= maxi

  if score >= 2:
    return mini

def detect_range(v):
  mini, maxi = 100, 10000
  while maxi <= 1e+100:
    if check_range(v, mini, maxi):
      return mini
    mini *= 100
    maxi *= 100

def fix_project(id):
  p = Project.objects.get(id=id)
  for t in p.tasks.all():
    fix_task(t.id)

def fix_task(id):
  t = Task.objects.get(id=id)
  annos = t.annotations.all()
  for a in annos:
    fixed = fix_annotation(a)
    if fixed:
      print('\n==> Task', id, 'Annotation', a.id, 'fixed!')

def fix_annotation(a):
  # print('  ==> Annotation', a.id)
  result = a.result
  fixed = False

  for i, r in enumerate(result):
    if 'value' not in r:
      continue
    v = r['value']
    if 'x' not in v:
      continue

    c = detect_range(v)
    if c is None:
      # print(f'Range is not detected: {v}')
      continue

    # print(f"     => i={i} {v['x']} {v['y']} --- {v['width']} {v['height']}  =>  ",)
    c_x, c_y = c, c
    v['x'] /= c_x
    v['y'] /= c_y
    v['width'] /= c_x
    v['height'] /= c_y
    # print(f"    => i={i} {v['x']} {v['y']} --- {v['width']} {v['height']}")
    fixed = True

  if fixed:
    # pass
    # a.pk = None
    # a.save()
    Annotation.objects.filter(id=a.id).update(result=result)

  return fixed

if __name__ == '__main__':
  fix_project(1555)  # <= project id