dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.38k stars 305 forks source link

Text length exceeds maximum of 1000000 #59

Open rxlian opened 4 years ago

rxlian commented 4 years ago

Hi, I got an error while feeding the text into the summarizer as follows.

ValueError: [E088] Text of length 1519175 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text).

I tried to add: nlp = spacy.load("en_core_web_sm") nlp.max_length = 1519175 but it doesn't work.

So I was wondering is there any ways to address this issue? Thanks.

rxlian commented 4 years ago

By the way, my transformer version is 2.3.0 instead of 2.2.2. And other things are the same.

paulowoicho commented 4 years ago

Were you able to figure out a workaround for this? Thanks

dmmiller612 commented 4 years ago

Looks to be a current common issue with Spacy, especially with max length not working. I may look at adding docs for multi-lines with spacy that might resolve this issue.

caramdache commented 4 years ago

+1

rxlian commented 3 years ago

Has this been solved?

lucasgsfelix commented 3 years ago

Has this been solved ?

RobinVds commented 2 years ago

+1

srknowdis commented 2 years ago

+1

KTG1 commented 2 years ago

Did anyone solve this problem?