THU-BPM / MetaSRE

The source code of paper "Semi-supervised Relation Extraction via Incremental Meta Self-Training"
21 stars 4 forks source link

how do you locate entity with multiple words? #4

Closed ShellingFord221 closed 3 years ago

ShellingFord221 commented 3 years ago

Hi, in train.py, you find entity's position by:

# Find e1(id:2487) and e2(id:2475) position
pos1 = (encoded_dict['input_ids'] == 2487).nonzero()[0][1].item()
pos2 = (encoded_dict['input_ids'] == 2475).nonzero()[0][1].item()
e1_pos.append(pos1)
e2_pos.append(pos2)

What do 2487 and 2475 mean? If an entity is composed of multiple words (e.g. Gossip Girl), how do you get this entity's encoding? Thanks!

ShellingFord221 commented 3 years ago

Are 2487 and 2475 just the IDs of symbol '<e1'> and '<e2'>?

xuminghu commented 3 years ago

As we pointed out in the paper, we use [E1] and [E2] to represent two entities, and 2487 and 2475 mean the IDs corresponding to [E1] and [E2] in the BERT Tokenizer.

ShellingFord221 commented 3 years ago

So there is no vocabulary for entity's mention? Just the indicator [E1] and [E2] for entities in all sentences?

xuminghu commented 3 years ago

Yes, since BERT could encode contextualized relational features, [E1] and [E2] could be used to represent entity-level relational features. A similar approach can also be found in our reference paper Figure 3.

ShellingFord221 commented 3 years ago

Got it. Thx.