MartinoMensio / spacy-dbpedia-spotlight

A spaCy wrapper for DBpedia Spotlight
MIT License
105 stars 11 forks source link

How to relate the entity to it place in the text? #12

Open ali3assi opened 2 years ago

ali3assi commented 2 years ago

Once I get the annotation of the entities how can get the starting position and ending position in the text. So I want to relate the text to its corresponding entity.

I do the following:

for ent in doc.ents:
            print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)

But I get the following exception:

Traceback (most recent call last):
  File "C:\Users\Admin\miniconda3\envs\projet1\lib\tkinter\__init__.py", line 1892, in __call__
    return self.func(*args)
  File "C:\Users\Admin\Documents\codePython\dbpedia\index.py", line 71, in <lambda>
    display_annotate = Button(root, height = 2, width = 20, text ="Annotate text", command = lambda:take_input()) 
  File "C:\Users\Admin\Documents\codePython\dbpedia\index.py", line 15, in take_input
    logger.warning(annotate(text_to_annotate))
  File "C:\Users\Admin\Documents\codePython\dbpedia\index.py", line 57, in annotate
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
  File "spacy\tokens\span.pyx", line 429, in spacy.tokens.span.Span.sent.__get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: `nlp.add_pipe('sentencizer')`. Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting `doc[i].is_sent_start`.
MartinoMensio commented 2 years ago

Hi @ali3assi, The error you are mentioning happens because by default the blank pipelines don't load the sentencizer. You can do the following:

import spacy
nlp = spacy.blank('en')
nlp.add_pipe('sentencizer')
nlp.add_pipe('dbpedia_spotlight')
doc = nlp("This is an example text. Let's mention Natural Language Processing")
for ent in doc.ents:
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
# Natural Language Processing 14 41 DBPEDIA_ENT

Or in alternative load one of the models that already load the sentencizer:

import spacy
# this needs to be installed https://spacy.io/models/en#en_core_web_sm
nlp = spacy.load('en_core_web_sm')

# then the following is the same
nlp.add_pipe('dbpedia_spotlight')
doc = nlp("This is an example text. Let's mention Natural Language Processing")
for ent in doc.ents:
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
# Natural Language Processing 14 41 DBPEDIA_ENT