argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.89k stars 365 forks source link

Accepting several predictions/annotations for the same record #1630

Open frascuchon opened 2 years ago

frascuchon commented 2 years ago

Introduction

Currently, records annotations/predictions only support store annotation info for just one annotator agent. The idea is to support several agents, for both, annotations and predictions. This change will bring several feature enhancements such as annotations agreement flows, weak label materialization, multi-pipeline monitoring, and more.

We could give more annotation/prediction control if we combine this feature with roles and dataset settings. By defining a set of annotators (even expected predictors patterns), we can limit the number of agents that can annotate a dataset.

Design keys

The proposed design keeps the prediction/annotation fields and includes a new predictions/annotations one, a data dictionary where the key corresponds to the annotation agent, and the value includes the annotation information provided by the client.

predictions = { “agent-one” : { “labels”: [“A”], “score”: [“0.3”] } } 

This new structure will be enabled for search, providing a mechanism for fine-tuning the searches based on specific annotators/predictors. We can replicate all computed fields per annotation entry, so we could do things like: annotations.agentA.annotated_as: FALSE or predictions.agent_b.predicted_as: TRUE

Backward compatibility

The new data model must tackle current record concepts, and provide a backward compatibility method to make both modes live.

Current fields such as predicted, predicted_as, and annotated_as could change the behavior since multiple values can be assigned. The only case where we can keep the old behavior should be when only an entry is provided.

Complete list of affected fields:

References

See recognai/rubrix-roadmap#59

frascuchon commented 2 years ago

There are some task to finish before close this issue:

cceyda commented 1 year ago

would this also solve the issue for token classification where searching a 'word' with 'annotated_as' returning results where that 'word' is not 'annotated_as' the 'selected tag' but all results that involve that word & tag(on a different word)