explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.21k stars 4.4k forks source link

Getting error while loading customized nlp form local machine??? #2388

Closed UtkarshKhare closed 6 years ago

UtkarshKhare commented 6 years ago

Hi @ines @honnibal, I am sharing my code to get to know what needs to be added into this to load NLP CODE

from spacy.matcher import PhraseMatcher
from spacy.tokens import Span
import pandas as pd
nlp = spacy.load('en_core_web_sm')

class EntityMatcher(object):
        name = 'enti1'

        def __init__(self, nlp, terms, label, name):
            self.name = name
            patterns = [nlp(text) for text in terms]
            self.matcher = PhraseMatcher(nlp.vocab)
            self.matcher.add(label, None, *patterns)

        def __call__(self, doc):
            matches = self.matcher(doc)
            for match_id, start, end in matches:
                span = Span(doc, start, end, label=match_id)
                doc.ents = list(doc.ents) + [span]
            return doc

df=pd.read_csv('D:\\Projects\\Deep Miner\\entity_vaultedge.csv')

##COMPANY
term=df[df['Label'] == 'COMPANY']
COMP=term['Entity'].tolist()
org=term['Label'].unique().tolist()

##NAME
term1=df[df['Label'] == 'NAME']
PER=term1['Entity'].tolist()
person=term1['Label'].unique().tolist()

##DATE
term2=df[df['Label'] == 'DATE']
DATE=term2['Entity'].tolist()
dte=term2['Label'].unique().tolist()

entity_matcher1 = EntityMatcher(nlp, COMP, (', '.join(org)),'entity1')
entity_matcher2 = EntityMatcher(nlp, PER, (', '.join(person)),'entity2')
entity_matcher3 = EntityMatcher(nlp, DATE, (', '.join(dte)),'entity3')

nlp.add_pipe(entity_matcher1)
nlp.add_pipe(entity_matcher2)
nlp.add_pipe(entity_matcher3)

Saving NLP

nlp.to_disk(output_dir)
print("Saved model to", output_dir)

Loading NLP

print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)

ERROR while loading it back from same location i am getting following error KeyError: "Can't find factory for 'entity1'."

Looking forward for your immediate help!!!

UtkarshKhare commented 6 years ago

This is the entity file used in above code for NAME, COMPANY and DATE .

entity_vault.xlsx

ines commented 6 years ago

The problem here is that when you save out the model, spaCy will serialize the data and config – but not your arbitary code like the entity matcher component. When you save out the model, your pipeline in the meta may look something like this: "pipeline": ["parser", "ner", "entity1"] etc. When you load back the model, spaCy needs to resolve those strings back to the components, so it will look them up in the factories in Language.factories. This works fine for the built-in components – but not for "entity1", because spaCy doesn't know what that is. You read find more about this in the pipeline components documentation.

In the future, spaCy will be solving this problem via entry points, which will let you wrap your component as a Python package and tell spaCy how to resolve the component string names. My PR #2348 will be included in the upcoming nightly release (and v2.1.0) and the description includes a detailed example and some background on entry points. From v2.1.0 on, this will be the recommended best practice for managing models and custom pipeline component dependencies.

For now, here are three main solutions:

1. Remove the component and re-add it later (easiest)

Disable the custom components during serialization:

with nlp.disable_pipes('entity1', 'entity2'):
    nlp.to_disk('/path/to/model')

And add them back when you load in the model:

nlp = spacy.load('/path/to/model')
nlp.add_pipe(entity1)
nlp.add_pipe(entity2)

2. Add a factory before loading the model

A factory is a function that takes the nlp object and optional config parameters and initialises th component. You can find more details here.

from spacy.language import Language
Language.factories['entity1'] = lambda nlp, **cfg: EntityMatcher(nlp, **cfg)

3. Include the custom component in the model's __init__.py (advanced)

Models are Python packages, so when you load an installed model, spaCy will import the package and call its load() method. All code present in the model's __init__.py will be executed, too, and you can ship any custom code with a model. This solution requires you to package your model using the spacy package command and editing the __init__.py to add your component and its factory. Also note the infobox and possible caveats described here.

UtkarshKhare commented 6 years ago

Thank you !!

faizanzaroo commented 6 years ago

@ines Thank you for your reply , but my issue is something different. I want the model to persist the data , i.e I dont want to add the patterns to the PhraseMatcher component everytime i load the model . I want to save the model such that i dont need to re add the patterns the next time i load it . Any way around ??

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.