egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

I want to use the entities extracted from spacy NER to match them to the entities linked with spaCy-entity-linker #18

Open cphoover opened 1 year ago

cphoover commented 1 year ago

In the examplee it loops through sents (for sent in doc.sents:) instead of using the entities extracted from spacy and linking those extracted entities.

MartinoMensio commented 1 year ago

Hi @cphoover , Thank you for using this library and for the request. If I understood correctly, you want to use the NER from SpaCy and the NEL from spaCy-entity-linker.

If you wish to do so, this library currently cannot do this natively, I will need to implement this. But at the moment you can use the code below to perform the operation:

import spacy
import spacy_entity_linker

# python -m spacy download en_core_web_lg
nlp = spacy.load("en_core_web_lg")
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
# normal NER from spacy
ents = doc.ents

# manipulations with the objects inside spacy_entity_linker

# collecting the entities in array
entities = []
# classifier
classifier = spacy_entity_linker.EntityClassifier.EntityClassifier()
for ent in ents:
    # build a term candidate (a simple span)
    termCandidate = spacy_entity_linker.TermCandidate.TermCandidate(ent)
    # get all the candidates for the term
    entityCandidates = termCandidate.get_entity_candidates()
    if len(entityCandidates) > 0:
        # select the best candidate
        entity = classifier(entityCandidates)
        # entity.span.sent._.linkedEntities.append(entity) # --> cannot if the attribute is not registered
        entities.append(entity)
    else:
        entity = None
    print(f'SpaCy: {(ent.text + " " + ent.label_).ljust(40)}spaCy-entity-linker: {entity}')
# doc._.linkedEntities = spacy_entity_linker.EntityCollection.EntityCollection(entities) # --> cannot if the attribute is not registered

OUTPUT:

# SpaCy: Apple ORG                               spaCy-entity-linker: Apple Inc.
# SpaCy: U.K. GPE                                spaCy-entity-linker: United Kingdom
# SpaCy: $1 billion MONEY                        spaCy-entity-linker: None

I will need to add some configuration code in order to make this feature native from this library, so it may take a bit.

I hope the code above is useful to you!

Best, Martino

dennlinger commented 1 year ago

I have just pushed a simple extension in the linked PR which in principle enables this functionality, by adding an EntityElement directly to a Span element as well. Here is the (simplified) use case:

import spacy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("entityLinker", last=True)
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
# normal NER from spacy
ents = doc.ents

for ent in doc.ents:
    print(ent.text, ent.label_, ent._.linkedEntities)

# Output:
# Apple ORG Apple Inc.
# U.K. GPE United Kingdom
# $1 billion MONEY None

In cases where spacy-entity-linker does not extract any linkedEntities, the property is None by default.

Edit: This is currently still an experimental feature and thus not in the public release.

cphoover commented 1 year ago

Thanks for responding @dennlinger and @MartinoMensio

Here is the linked PR for others to reference: https://github.com/egerber/spaCy-entity-linker/pull/20