IBM / zshot

Zero and Few shot named entity & relationships recognition
https://ibm.github.io/zshot
MIT License
350 stars 20 forks source link

Add Linker Ensemble #53

Closed marmg closed 1 year ago

marmg commented 1 year ago
Status Type ⚠️ Core Change Issue
Ready Feature No Link

Problem

Add linker ensemble to allow using different linkers and different descriptions to improve the performance.

Solution

Implementation of LinkerEnsemble which takes as input the list of linkers to use, the strategy (one of: max, count) and the threshold (to save entities).

It will group the entities by the name, and create combinations of them to extract with each of the linkers that set of entities, to finally group the results.

Example:

import spacy
from zshot import PipelineConfig
from zshot.linker import LinkerSMXM, LinkerTARS
from zshot.linker.linker_ensemble import LinkerEnsemble
from zshot.utils.data_models import Entity
from zshot import displacy

nlp = spacy.blank("en")

config = PipelineConfig(
    entities=[
        Entity(name="fruits", description="The sweet and fleshy product of a tree or other plant."),
        Entity(name="fruits", description="Names of fruits such as banana, oranges"),
        Entity(name="vitamin", description="A nutrient that the body needs in small amounts to function " \
                                           "and stay healthy"),
        Entity(name="vitamin", description="Vitamins are substances that our bodies need to develop and " \
                                           "function normally")
    ],
    linker=LinkerEnsemble(
        linkers=[
            LinkerSMXM(),
            LinkerTARS(),
        ],
        threshold=0.25
    )
)

nlp.add_pipe("zshot", config=config, last=True)
# annotate a piece of text
doc = nlp('Apple or oranges have a lot of vitamin C.')

# Visualize the result
displacy.render(doc, style='ent')
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 92.21% and project coverage change: -0.53 :warning:

Comparison is base (0fc473c) 93.04% compared to head (55e151c) 92.51%.

:exclamation: Current head 55e151c differs from pull request most recent head 1745b5b. Consider uploading reports for the commit 1745b5b to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #53 +/- ## ========================================== - Coverage 93.04% 92.51% -0.53% ========================================== Files 67 73 +6 Lines 2832 3047 +215 ========================================== + Hits 2635 2819 +184 - Misses 197 228 +31 ``` | [Impacted Files](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM) | Coverage Δ | | |---|---|---| | [zshot/linker/linker\_regen/utils.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvbGlua2VyL2xpbmtlcl9yZWdlbi91dGlscy5weQ==) | `60.52% <ø> (-17.53%)` | :arrow_down: | | [zshot/linker/linker\_ensemble/utils.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvbGlua2VyL2xpbmtlcl9lbnNlbWJsZS91dGlscy5weQ==) | `63.33% <63.33%> (ø)` | | | [zshot/linker/linker\_ensemble/linker\_ensemble.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvbGlua2VyL2xpbmtlcl9lbnNlbWJsZS9saW5rZXJfZW5zZW1ibGUucHk=) | `87.50% <87.50%> (ø)` | | | [zshot/utils/ensembler.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvdXRpbHMvZW5zZW1ibGVyLnB5) | `98.33% <98.33%> (ø)` | | | [zshot/linker/\_\_init\_\_.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvbGlua2VyL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [zshot/linker/linker\_ensemble/\_\_init\_\_.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvbGlua2VyL2xpbmtlcl9lbnNlbWJsZS9fX2luaXRfXy5weQ==) | `100.00% <100.00%> (ø)` | | | [zshot/linker/linker\_tars.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvbGlua2VyL2xpbmtlcl90YXJzLnB5) | `97.87% <100.00%> (+4.25%)` | :arrow_up: | | [zshot/tests/linker/test\_ensemble\_linker.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvdGVzdHMvbGlua2VyL3Rlc3RfZW5zZW1ibGVfbGlua2VyLnB5) | `100.00% <100.00%> (ø)` | | | [zshot/tests/linker/test\_linker.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvdGVzdHMvbGlua2VyL3Rlc3RfbGlua2VyLnB5) | `96.92% <100.00%> (ø)` | | | [zshot/tests/linker/test\_regen\_linker.py](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM#diff-enNob3QvdGVzdHMvbGlua2VyL3Rlc3RfcmVnZW5fbGlua2VyLnB5) | `91.78% <100.00%> (-8.22%)` | :arrow_down: | | ... and [5 more](https://codecov.io/gh/IBM/zshot/pull/53?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=IBM)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.