HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.55k stars 2.42k forks source link

LTR and RTL annotation for text #2653

Open Deisss opened 2 years ago

Deisss commented 2 years ago

Hi,

The annotation tool works great, except the export never state if the start/end are starting from left or right of the text.

Example:

[
  {
    "text": "أجرت وزارة الصحة الإماراتية 123,037 فحصا ضمن خططها لتوسيع نطاق الفحوصات وتكشف عن 1,554 إصابة جديدة بفيروس ",
    "id": 3,
    "label": [
      {
        "start": 81,
        "end": 86,
        "text": "1,554",
        "labels": [
          "MISC"
        ]
      },
      {
        "start": 28,
        "end": 35,
        "text": "123,037",
        "labels": [
          "MISC"
        ]
      }
    ],
    "annotator": 1,
    "annotation_id": 4,
    "created_at": "2022-07-12T15:42:19.445022Z",
    "updated_at": "2022-07-12T16:05:29.738373Z",
    "lead_time": 104.119
  },
  {
    "text": "Donald Trump enters the white house",
    "id": 1,
    "label": [
      {
        "start": 0,
        "end": 12,
        "text": "Donald Trump",
        "labels": [
          "PER"
        ]
      },
      {
        "start": 24,
        "end": 35,
        "text": "white house",
        "labels": [
          "LOC"
        ]
      }
    ],
    "annotator": 1,
    "annotation_id": 3,
    "created_at": "2022-07-12T15:32:52.517225Z",
    "updated_at": "2022-07-12T15:32:52.517383Z",
    "lead_time": 5.122
  }

The first assume RTL, while the second is LTR (and count accordingly), and the export never state in which direction you're supposed to read the start/end position of entities.

makseq commented 2 years ago

Similar problems: https://github.com/heartexlabs/label-studio/issues/1888