boudinfl / pke

Python Keyphrase Extraction module
GNU General Public License v3.0
1.55k stars 291 forks source link

max_length parameter error with the latest version #202

Closed shinshinsakasaka closed 1 year ago

shinshinsakasaka commented 2 years ago

Thank you for developing a great tool.

I'm facing a max_length parameter error. I installed pke by pip install git+https://github.com/boudinfl/pke.git

Python 3.9.12


✔ Loaded compatibility table

================= Installed pipeline packages (spaCy v3.4.0) =================
ℹ spaCy installation: C:\Users\shins\anaconda_new\lib\site-packages\spacy

NAME             SPACY            VERSION
en_core_web_sm   >=3.4.0,<3.5.0   3.4.0     ✔
  1. 
    extractor.load_document(input = text,language = 'en',normalization = None)

ValueError: [E088] Text of length 1210306 exceeds maximum of 1000000. The parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text)


2.

extractor.load_document(input = text,language = 'en',max_length = 1210310, normalization = None)

TypeError: load_document() got an unexpected keyword argument 'max_length'



How can I fix this problem? I appreciate your help. 
ygorg commented 1 year ago

Hi, thanks for using pke. You can load your document using spacy, modify the max_length and then pass it to pke like so:

document = 'My text'
nlp = spacy.load('en_core_web_sm')
nlp.max_length = 1000000000000000000000000
preproc_doc = nlp(document)

extractor = pke.unsupervised.MultipartiteRank()
extractor.load_document(preproc_doc)

If you face this error maybe pke is not the right tool for you (cf. pke#131)