Closed AlJohri closed 4 years ago
I just want to add that custom attributes are also not retained when using multiple processes. I tried running:
Span.set_extension('entity_id', default=None)
# include this at the end of my custom component
for ent in doc.ents:
ent._.entity_id = ent.ent_id_
The custom attribute entity_id
is also not retained when n_process=2
.
Found a temporary work around:
# include this at the end of my custom pipeline component
doc.user_data['labels'] = [(x.start_char, x.end_char, x.label, x.ent_id_) for x in doc.ents]
doc.user_data
gets retained when using multiple processes
Hi @AlJohri , thanks for the detailed analyses and report !
The token.ent_id
attribute was indeed not being serialized. PR #4852 should fix that.
With respect to the custom attributes I'm a little puzzled though, because we have a serialization test that checks just that, and I even expanded it to test the token level in the same PR, and got no errors. If the PR gets merged and problems persist afterwards, I'd suggest opening a new issue to address that specific problem.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
The
ent_id
andent_id_
is not retained when using multiple processes during nlp.pipe. Presumably it is not getting serialized properly.How to reproduce the behaviour
This example should be fully reproducible. The output should look like this:
Code:
Info about spaCy
I'm using master branch as of commit
3431ac42de470a4bb73f1c6852a5ccffc07da7b1
.