Accepting several predictions/annotations for the same record

frascuchon commented 2 years ago

Introduction

Currently, records annotations/predictions only support store annotation info for just one annotator agent. The idea is to support several agents, for both, annotations and predictions. This change will bring several feature enhancements such as annotations agreement flows, weak label materialization, multi-pipeline monitoring, and more.

We could give more annotation/prediction control if we combine this feature with roles and dataset settings. By defining a set of annotators (even expected predictors patterns), we can limit the number of agents that can annotate a dataset.

Design keys

The proposed design keeps the prediction/annotation fields and includes a new predictions/annotations one, a data dictionary where the key corresponds to the annotation agent, and the value includes the annotation information provided by the client.

predictions = { “agent-one” : { “labels”: [“A”], “score”: [“0.3”] } }

This new structure will be enabled for search, providing a mechanism for fine-tuning the searches based on specific annotators/predictors. We can replicate all computed fields per annotation entry, so we could do things like: annotations.agentA.annotated_as: FALSE or predictions.agent_b.predicted_as: TRUE

Backward compatibility

The new data model must tackle current record concepts, and provide a backward compatibility method to make both modes live.

Current fields such as predicted, predicted_as, and annotated_as could change the behavior since multiple values can be assigned. The only case where we can keep the old behavior should be when only an entry is provided.

Complete list of affected fields:

predicted: computed only when one single agent is defined. It will be deprecated and removed in future versions
predicted_as: computed only when one single agent is defined. It will be deprecated and removed in future versions
annotated_as: computed only when one single agent is defined. It will be deprecated and removed in future versions
predicted_by: showing all record agents
annotated_by: showing all record agents
scores: computed only when one single agent is defined (cc: @dvsrepo). It will be deprecated and removed in future versions
prediction: this field will be deprecated and removed in future versions
annotation: this field will be use as the "final/real annotation" (annotation agreement). Maybe a better naming in future versions.
explanation: (only for text classification) computed only when one single agent is defined. It will be deprecated and removed in future versions. The explanation must be defined at the prediction level.
token classification metrics: there are some metrics defined for annotations and predictions. Maybe does not make sense to build all agent metrics, but these fields will be totally affected by the new data model.

References

See recognai/rubrix-roadmap#59

frascuchon commented 2 years ago

There are some task to finish before close this issue:

[ ] Allow log records with several annotations/predictions
[ ] Handle multiple annotations from UI (view, selected, remove, change,...)
[ ] Adapt related filters (backend and UI)
[ ] Adapt the definition of prediction ok/ko when multiple values can be present.

cceyda commented 1 year ago

would this also solve the issue for token classification where searching a 'word' with 'annotated_as' returning results where that 'word' is not 'annotated_as' the 'selected tag' but all results that involve that word & tag(on a different word)

argilla-io / argilla