HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.56k stars 2.42k forks source link

Per-instance subset of label options for varying label requirements #3846

Open e-tornike opened 1 year ago

e-tornike commented 1 year ago

Is your feature request related to a problem? Please describe. Extreme multi-label classification tasks require annotators to choose from a large set of possible labels. This set of labels may vary from instance to instance in the data. Currently, per-instance label assignments are not supported.

Describe the solution you'd like In the project configuration, you could upload a large set of labels. Each label l could be associated with a list of instance IDs D for which the label could be used during annotation. For L labels, the best case is D associations, while the worst case is D x L associations.

Describe alternatives you've considered Associate each data instance d with a list of label IDs L. For D instances, the best case is L associations, while the worst case is D x L associations.

makseq commented 1 year ago

@e-tornike Could you please provide examples how it can look like? Maybe even screenshots.

e-tornike commented 1 year ago

It would be nice, if a single data instance could be mapped to multiple sets of labels. For example:

{
  "instance_1": [
    "label_set_1", 
    "label_set_2"
  ],
  "instance_2" [
    "label_set_1"
  ],
}

Each set of labels could be defined somewhere else. For example:

{
  "label_set_1": [
    "label_1",
    "label_2",
    ...
    "label_50"
  ],
  "label_set_2": [
    "label_1",
    "label_2",
    ...
    "label_100"
  ],
}

Then, when loading the labels to be used for annotating an instance, the labels could be aggregated.

makseq commented 1 year ago

What instances do you mean? Bboxes or spans in text? or spans in time-series? Have you tried to use Taxonomy tag with perRegion="true"?