argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.91k stars 367 forks source link

[BUG-UI/UX] Error when annotating other datasets #4430

Closed sdiazlor closed 9 months ago

sdiazlor commented 10 months ago

Describe the bug It doesn't allow to annotate. In any case, when adding annotations, it raises this error (when clicking on the labels (or predictions) or directly if the annotations were added as 'responses'. The examples are for the TextClassification, but the same happens for the TokenClassification when I click on validate or discard, or the Text2Text.

To Reproduce Steps to reproduce the behavior:

from datasets import load_dataset

dataset = load_dataset("imdb", split="train").shuffle(seed=42).select(range(100))

dataset = dataset.rename_column("label", "annotation")

rg_dataset = rg.read_datasets(ds, task="TextClassification")

rg.log(dataset_rg, "prueba")

or the basic one:

import argilla as rg

settings = rg.TextClassificationSettings(label_schema=["A", "B"])

rg.configure_dataset_settings(name="my_dataset", settings=settings)

import argilla as rg

rec = rg.TextClassificationRecord(
    text="beautiful accommodations stayed hotel santa... hotels higher ranked website.",
    prediction=[("A", 0.75), ("B", 0.25)],
    annotation="A"
)

rg.log(records=rec, name="my_dataset")

Then I go to the UI and click on the dataset and this what raises (clicking on the label or directly if already has annotations) image image

Expected behavior Allow annotations

Environment (please complete the following information):

Additional context Add any other context about the problem here.

dosubot[bot] commented 10 months ago

🤖

Hello @sdiazlor! 👋

I'm Dosu, an AI bot here to help you while we wait for a human maintainer to jump in. I can assist you with bugs, answer questions, and guide you in becoming a contributor. I'm here to ensure you have a smooth experience with Argilla. Let's get started! 😊

Thank you for providing detailed information about the issue you're experiencing. Based on the information you've provided and a similar issue that was previously resolved in the Argilla repository, it seems that the error might be occurring due to one of the following reasons:

  1. None of the records have a prediction AND annotation: This is checked in the find_label_errors function. If none of the records have both a prediction and an annotation, a NoRecordsError is raised.

  2. A prediction is missing for one of the labels: This is checked in the _construct_s_and_psx function. If a prediction is missing for a label, a MissingPredictionError is raised.

To resolve these issues, ensure that all records have both a prediction and an annotation, and that predictions are available for all labels. Here is the relevant code:

# select only records with prediction and annotation
records = [rec for rec in records if rec.prediction and rec.annotation]
if not records:
    raise NoRecordsError("It seems that none of your records have a prediction AND annotation!")

# ...

try:
    psx[i] = [pred[label] for label in labels_mapping]
except KeyError as error:
    raise MissingPredictionError(f"It seems a prediction for {error} is missing in the following record: {rec}")

try:
    s[i] = (
        [labels_mapping[label] for label in rec.annotation]
        if rec.multi_label
        else labels_mapping[rec.annotation]
    )
except KeyError as error:
    raise MissingPredictionError(f"It seems predictions are missing for the label {error}!")

You can find this code in the src/argilla/labeling/text_classification/label_errors.py file.

Regarding the similar issue that was previously resolved, it was titled [Annotation] Validate label for single label text classification dataset raise an error and was closed by a pull request: fix(#945): validate label for single label text classification dataset. Unfortunately, I couldn't find the specific changes made in this pull request that resolved the issue.

If the problem persists after ensuring that all records have both a prediction and an annotation, and that predictions are available for all labels, please provide more information about the error, such as the full error message and stack trace, and any additional steps you've taken to debug the issue. This will help us better understand the problem and find a solution.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.