HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.33k stars 2.4k forks source link

Duplicated prediction results when open Auto-Annotation in NER template #2641

Open aisensiy opened 2 years ago

aisensiy commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

I am trying to use label-studio with the feature of Auto-Annotation in a template of NER. But I found that the same prediction from my ML backend will be triggered again and again.

I am not sure if this is the LabelStudio's responsibility or my ML backend's responsibility to avoid this behavior.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

I think LabelStudio show also know that the current task is already get annoated by the ML backend so that the same annotation will not be added twice.

Screenshots If applicable, add screenshots to help explain your problem.

Here is a screen record to show the behavior (no voice).

https://user-images.githubusercontent.com/661860/178113109-f4dbae67-e065-4216-9701-d2a0c201188b.mov

In the video you can see no matter how many times I annotated the text, the same prediction result from ML backend will be added to the text.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

KonstantinKorotaev commented 2 years ago

Hi @aisensiy Could you please check what your ML service returned in last calls?

aisensiy commented 2 years ago

Could you please check what your ML service returned in last calls?

Do you mean the predict result? The result looks like this:

[{'result': [{'from_name': 'label',
    'to_name': 'text',
    'type': 'labels',
    'value': {'start': 0, 'end': 3, 'text': '中金在', 'labels': ['MISC']},
    'score': 0.7986028842714906}]}]

And here is the model.py in my label-studio-ml-backend:

import random

from label_studio_ml.model import LabelStudioMLBase

class DummyModel(LabelStudioMLBase):

    def __init__(self, **kwargs):
        super(DummyModel, self).__init__(**kwargs)

        # pre-initialize your variables here
        from_name, schema = list(self.parsed_label_config.items())[0]
        self.from_name = from_name
        self.to_name = schema['to_name'][0]
        self.labels = schema['labels']

    def predict(self, tasks, **kwargs):
        print(self.labels)
        print(tasks)
        results = []
        for task in tasks:
            text = task['data']['text']

            results.append({
                'result': [{
                    'from_name': self.from_name,
                    'to_name': self.to_name,
                    'type': 'labels',
                    'value': {
                        'start': 0, 'end': 3, 
                        'text': text[:3],
                        'labels': ['MISC']
                    },
                    'score': random.uniform(0, 1)
                }]
            })
        print(results)
        return results

    def fit(self, completions, workdir=None, **kwargs):
        return {'random': random.randint(1, 10)}

It is based on the dummy_model. I just use the first three letters to generate a dummy result and return.

KonstantinKorotaev commented 2 years ago

Do you mean the predict result? The result looks like this:

Yes, do you have a sequence of calls that lead to duplicated results?

aisensiy commented 2 years ago

Yes. I do have a sequence of actions to manually add some more annotations not covered by the ml backend. But I think it is not necessary to return the prediction multi times. Every time when I do some actions (even when I delete some annotations) the same prediction result will be added.

image

aisensiy commented 2 years ago

Any process about this issue?

KonstantinKorotaev commented 2 years ago

Hi @aisensiy I have created a new feature request for it.