Open frascuchon opened 2 years ago
There are some task to finish before close this issue:
would this also solve the issue for token classification where searching a 'word' with 'annotated_as' returning results where that 'word' is not 'annotated_as' the 'selected tag' but all results that involve that word & tag(on a different word)
Introduction
Currently, records annotations/predictions only support store annotation info for just one annotator agent. The idea is to support several agents, for both, annotations and predictions. This change will bring several feature enhancements such as annotations agreement flows, weak label materialization, multi-pipeline monitoring, and more.
We could give more annotation/prediction control if we combine this feature with roles and dataset settings. By defining a set of annotators (even expected predictors patterns), we can limit the number of agents that can annotate a dataset.
Design keys
The proposed design keeps the
prediction/annotation
fields and includes a newpredictions/annotations
one, a data dictionary where the key corresponds to the annotation agent, and the value includes the annotation information provided by the client.This new structure will be enabled for search, providing a mechanism for fine-tuning the searches based on specific annotators/predictors. We can replicate all computed fields per annotation entry, so we could do things like:
annotations.agentA.annotated_as: FALSE
orpredictions.agent_b.predicted_as: TRUE
Backward compatibility
The new data model must tackle current record concepts, and provide a backward compatibility method to make both modes live.
Current fields such as
predicted
,predicted_as
, andannotated_as
could change the behavior since multiple values can be assigned. The only case where we can keep the old behavior should be when only an entry is provided.Complete list of affected fields:
predicted
: computed only when one single agent is defined. It will be deprecated and removed in future versionspredicted_as
: computed only when one single agent is defined. It will be deprecated and removed in future versionsannotated_as
: computed only when one single agent is defined. It will be deprecated and removed in future versionspredicted_by
: showing all record agentsannotated_by
: showing all record agentsscores
: computed only when one single agent is defined (cc: @dvsrepo). It will be deprecated and removed in future versionsprediction
: this field will be deprecated and removed in future versionsannotation
: this field will be use as the "final/real annotation" (annotation agreement). Maybe a better naming in future versions.explanation
: (only for text classification) computed only when one single agent is defined. It will be deprecated and removed in future versions. The explanation must be defined at the prediction level.References
See recognai/rubrix-roadmap#59