I am trying to use Doc.to_bytes() after extending Doc with a custom attribute. I can successfully serialize and deserialize the custom attribute on its own, but this fails with Doc.to_bytes(). Here's a minimal reproducible example:
import spacy
from spacy.tokens import Doc
nlp = spacy.blank('en')
def serialize_spans(obj, attr):
return [(span.start_char, span.end_char) for span in getattr(obj._, attr)]
def deserialize_spans(obj, attr):
setattr(obj._, attr, [obj.char_span(start, end) for start, end in value])
Doc.set_extension("special_spans", default = list(), to_bytes = serialize_spans, from_bytes = deserialize_spans)
doc = nlp('The quick brown fox jumped over the lazy dog.')
doc._.special_spans = [doc[0:2], doc[4:6]]
# Works well
serialize_spans(doc, 'special_spans')
# Doesn't work
doc.to_bytes()
How to reproduce the behaviour
I am trying to use Doc.to_bytes() after extending Doc with a custom attribute. I can successfully serialize and deserialize the custom attribute on its own, but this fails with Doc.to_bytes(). Here's a minimal reproducible example:
Your Environment