AI21Labs / sense-bert

This is the code for loading the SenseBERT model, described in our paper from ACL 2020.
Apache License 2.0
43 stars 9 forks source link

Aligning tokens with supersenses? #4

Open victoryhb opened 4 years ago

victoryhb commented 4 years ago

Thank you very much for sharing the code for your excellent paper. Pardon me for asking this newbie question: how to align the tokens in the input sentence with the supersenses outputted from the model? For example, the words in the sentence "I went to the store to buy some groceries." do not appear to be aligned with the following senses

['noun.person']
['verb.communication']
['verb.social']
['verb.communication']
['noun.artifact']
['noun.artifact']
['verb.communication']
['verb.cognition']
['noun.artifact']
['noun.artifact']
['adv.all']
['adv.all']

as printed using the following code:

for i, id_ in enumerate(input_ids[0]):
  print(sensebert_model.tokenizer.convert_ids_to_senses([np.argmax(supersense_logits[0][i])]))

Could you please provide some example code for how to do this properly? Thanks a lot in advance!

MeMartijn commented 3 years ago

@victoryhb This might be a long shot, but I was wondering whether you figured this out in the end. I also can't seem to figure out how to align the tokens.

MeMartijn commented 3 years ago

@oriram Do you have any hints on how to align the predicted senses to words in sentences?

oriram commented 3 years ago

Hi @MeMartijn, There is no clear "alignment" as out-of-vocabulary words are split to multiple tokens (and therefore can have multiple supersenses). However, you can do one of the following:

Hope this helps, Ori