Text length exceeds maximum of 1000000

rxlian commented 4 years ago

Hi, I got an error while feeding the text into the summarizer as follows.

ValueError: [E088] Text of length 1519175 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text).

I tried to add: nlp = spacy.load("en_core_web_sm") nlp.max_length = 1519175 but it doesn't work.

So I was wondering is there any ways to address this issue? Thanks.

rxlian commented 4 years ago

By the way, my transformer version is 2.3.0 instead of 2.2.2. And other things are the same.

paulowoicho commented 4 years ago

Were you able to figure out a workaround for this? Thanks

dmmiller612 commented 4 years ago

Looks to be a current common issue with Spacy, especially with max length not working. I may look at adding docs for multi-lines with spacy that might resolve this issue.