Open bikash119 opened 3 months ago
@nataliaElv, seems like an interesting feature that has some overlap with bulk for span questions.
@frascuchon would this be easier now that we have the advanced queries included.
This issue is stale because it has been open for 90 days with no activity.
Is your feature request related to a problem? Please describe. There are situations when same bigrams,trigrams, etc appear multiple times in a text being annotated. The annotator has to repeatedly annotate the n-grams, else the tokens will be labelled as "0" under IOB scheme by default.
Describe the solution you'd like Currently, Argilla UI enables us to annotate/label tokens in a text with an easy-to-use interface. However, I've identified a use case where an additional feature could enhance efficiency:
Sample claim text:
Assume we have labels like: ["method of use", "product", "machine", "system"] Here first occurrence of token
tympanic membrane
is labelled asproduct
by annotator. Since there are multiple instances of thetympanic membrane
, the annotator must annotate each instance appropriately, else the system implicitly annotates them as 'O' per the IOB scheme to each token of the bigram. This makes it harder for the model to learn that "tympanic membrane" is a product and shouldn't be treated as two different tokens "tympanic" and "membrane".Proposed Improvement
When an annotator labels a token (e.g., "tympanic membrane" as "product"), the system could automatically identify and suggest the same label for all exact matches of that token in the text. This would:
Reduce repetitive labeling actions Save significant time, especially in longer texts with recurring terms Ensure consistency in labeling across the document Prevent accidental omissions that could lead to incorrect 'O' labels in the IOB scheme