When I try to follow the pipeline steps laid out in the README exactly, I receive the following error at the preprocessing stage:
AttributeError: 'spacy.tokens.span.Span' object has no attribute 'string'
Upon removing the text_splitter from the pipeline setup I no longer get this error, but it would be useful to be able to initialize the pipeline with the text splitter (e.g. for passing in texts whose tokenization is longer than 512 tokens).
Hi there!
When I try to follow the pipeline steps laid out in the README exactly, I receive the following error at the preprocessing stage:
Upon removing the
text_splitter
from the pipeline setup I no longer get this error, but it would be useful to be able to initialize the pipeline with the text splitter (e.g. for passing in texts whose tokenization is longer than 512 tokens).Thank you very much for the help!