facebookresearch / GENRE

Autoregressive Entity Retrieval
Other
759 stars 99 forks source link

Start and end positions of tokens #29

Closed dinani65 closed 3 years ago

dinani65 commented 3 years ago

Thanks for your interesting work. afaik, it is necessary to specify the start and end of tokens in input sentences and also one tag is possible for each sentence at a time. So, If we want to use it to annotate the content of a webpage, it is necessary to specify the words at first, right? could u please explain what get_entity_spans does?

20

Is is responsible to detect the tags and their start and end positions?

nicola-decao commented 3 years ago

Do you want to do Mention Detection, Entity Disambiguation or Entity Linking?

dinani65 commented 3 years ago

In fact, I am looking for a multilingual named entity linking approach which is able to disambiguate names using entity linking. GENRE is not multilingual but it is possible to have more than one tag in input text while mGENRE is multilingual with the mentioned restrictions. Mention Detection also should be done before employing GENRE/mGENRE. Another question, Is there any restriction for the size of text input?

nicola-decao commented 3 years ago
  1. You can use GENRE to do entity linking in English only (both mention detection and entity disambiguation).
  2. You can use GENRE to do entity disambiguation in English only.
  3. You can use mGENRE to do entity disambiguation in 100 languages.
  4. You can combine an off the shelf mention detection model (like FLAIR) and then apply mGENRE if you want to have a multilingual entity linking system at the end. Also, the size of text input is limited to 1024 BPEs right now (a limitation that comes from BART).