explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.18k stars 4.4k forks source link

Pickling of PhraseMatcher not yet resolved in 2.1.3 #3494

Closed kushalc closed 5 years ago

kushalc commented 5 years ago

https://github.com/explosion/spaCy/issues/1971 and https://github.com/explosion/spaCy/issues/3248 show that pickling works, but I'm still able to repro issues in 2.1.3.

Minimal Reproducible Example

Building off #3252 unit test:

from spacy.matcher import PhraseMatcher
from spacy.lang.en import English
from spacy.compat import pickle
nlp = English()

matcher = PhraseMatcher(nlp.vocab)
matcher.add("TEST1", None, nlp("a"), nlp("b"), nlp("c"))
matcher.add("TEST2", None, nlp("d"))
data = pickle.dump(matcher, open("matcher.pickle", "wb"))
new_matcher = pickle.load(open("matcher.pickle", "rb"))
print(new_matcher(nlp("a b c")))  # prints [(9789093191506324669, 0, 1), (9789093191506324669, 1, 2), (9789093191506324669, 2, 3)]
print(matcher(nlp("a b c")))  # prints [(9789093191506324669, 0, 1), (9789093191506324669, 1, 2), (9789093191506324669, 2, 3)]

(re-start Python)

from spacy.lang.en import English
from spacy.compat import pickle
nlp = English()
new_matcher = pickle.load(open("matcher.pickle", "rb"))
new_matcher(nlp("a b c"))  # prints []

Info about spaCy

svlandeg commented 5 years ago

I can reproduce your error with the code above. However, I don't think it's a Matcher pickling issue. If you would keep the same nlp object rather than constructing a new one, you wouldn't run into this, right?

kushalc commented 5 years ago

Is there a recommended best practice for pickling nlp/Matcher between different processes? The fundamental use case we have to solve is serialization across different machines so we can't simply re-use the same nlp instance.

Kushal

On Thu, Jul 11, 2019 at 8:34 AM Sofie Van Landeghem < notifications@github.com> wrote:

I can reproduce your error with the code above. However, I don't think it's a Matcher pickling issue. If you would keep the same nlp object rather than constructing a new one, you wouldn't run into this, right?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/explosion/spaCy/issues/3494?email_source=notifications&email_token=AAAMMOUHRGYG5WXGP5W4HFTP65HH5A5CNFSM4HBV3CS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZXCYNA#issuecomment-510536756, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOWGIE266T6BTKJRNBTP65HH5ANCNFSM4HBV3CSQ .

svlandeg commented 5 years ago

Is it an option to serialize the nlp object (https://spacy.io/usage/saving-loading) ?

nlp1.to_disk("mynlp")
nlp2 = spacy.load("mynlp")
no-response[bot] commented 5 years ago

This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.