Understanding trigger files

INK-USC / TriggerNER

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

173 stars 19 forks source link

Hello,

I have a question regarding the trigger files. I do not completely understand the numeric ids found next to the words that are considered as triggers. For instance:

EU  B-ORG
rejects T-3
German  T-0
call    T-4
to  O
boycott T-1
British T-2
lamb    T-2
.   O

I understand that the words with a T are triggers regarding the entity EU. Thus, for each entity there are different triggers. However, what does mean the number next to the T? For a moment, I thought that the ids were the order in which tokens should be used. I have thought as well that they would group the triggers. A colleague though that it was different levels of triggers. But, I have seen that some examples do not contain a T-0 and, in some cases and the triggers are not numbered in specific pattern.

So, it is not completely clear the meaning of the numbers.

Hi, Thank you for your interest.

Since the definition of trigger is a "group of words" that can help labeling decisions, each T-n is "trigger". For the above example, 'British lamb' is one trigger.

The number doesn't have any pattern. We assign a single sentence with three annotators and use the consolidated results as our triggers. During the consolidation, the number is randomly assigned and some of the triggers are removed. That's why some of the sentences don't have a specific number.

The number doesn't have 'meaning'. It is for marking words in the same trigger to be recognized as a same trigger, and also distinguishing a trigger from other triggers.

The current trigger file is turked file so that we agree it seems a little bit awkward. We now annotating more "reasonable trigger" files by ourselves with experts' advice. Please stay tuned until release!

Thank you.

INK-USC / TriggerNER

Understanding trigger files #2