IBM / zshot

Zero and Few shot named entity & relationships recognition
https://ibm.github.io/zshot
MIT License
350 stars 20 forks source link

Add LinkerEnsemble #52

Closed marmg closed 1 year ago

marmg commented 1 year ago

Scenario summary

Add linker ensemble to allow using different linkers and different descriptions to improve the performance.

Proposed solution

Implementation of LinkerEnsemble which takes as input the list of linkers to use, the strategy (one of: max, count) and the threshold (to save entities).

It will group the entities by the name, and create combinations of them to extract with each of the linkers that set of entities, to finally group the results.

Example:

import spacy
from zshot import PipelineConfig
from zshot.linker import LinkerSMXM, LinkerTARS
from zshot.linker.linker_ensemble import LinkerEnsemble
from zshot.utils.data_models import Entity
from zshot import displacy

nlp = spacy.blank("en")

config = PipelineConfig(
    entities=[
        Entity(name="fruits", description="The sweet and fleshy product of a tree or other plant."),
        Entity(name="fruits", description="Names of fruits such as banana, oranges"),
        Entity(name="vitamin", description="A nutrient that the body needs in small amounts to function " \
                                           "and stay healthy"),
        Entity(name="vitamin", description="Vitamins are substances that our bodies need to develop and " \
                                           "function normally")
    ],
    linker=LinkerEnsemble(
        linkers=[
            LinkerSMXM(),
            LinkerTARS(),
        ],
        threshold=0.25
    )
)

nlp.add_pipe("zshot", config=config, last=True)
# annotate a piece of text
doc = nlp('Apple or oranges have a lot of vitamin C.')

# Visualize the result
displacy.render(doc, style='ent')