I have access to my server: Through SSH | through the webadmin
Are you in a special context or did you perform some particular tweaking on your YunoHost instance?: no
Steps to reproduce
Just upload any PDF file to paperless-ngx.
Expected behavior
PDF file should be uploaded successfully without error.
Logs
[2024-08-31 22:07:53,260] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: 2024-05 Ppc N.pdf: The following error occurred while storing document 2024-05 Ppc N.pdf after parsing:
**********************************************************************
Resource [93mpunkt_tab[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt_tab')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt_tab/english/[0m
Searched in:
- PosixPath('/var/www/paperless-ngx/nltk_data')
**********************************************************************
Traceback (most recent call last):
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/asgiref/sync.py", line 327, in main_wrap
raise exc_info[1]
File "/var/www/paperless-ngx/src/documents/consumer.py", line 670, in run
document_consumption_finished.send(
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/django/dispatch/dispatcher.py", line 176, in send
return [
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/django/dispatch/dispatcher.py", line 177, in <listcomp>
(receiver, receiver(signal=self, sender=sender, **named))
File "/var/www/paperless-ngx/src/documents/signals/handlers.py", line 95, in set_correspondent
potential_correspondents = matching.match_correspondents(document, classifier)
File "/var/www/paperless-ngx/src/documents/matching.py", line 37, in match_correspondents
pred_id = classifier.predict_correspondent(document.content) if classifier else None
File "/var/www/paperless-ngx/src/documents/classifier.py", line 413, in predict_correspondent
X = self.data_vectorizer.transform([self.preprocess_content(content)])
File "/var/www/paperless-ngx/src/documents/classifier.py", line 386, in preprocess_content
words: list[str] = word_tokenize(
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
tokenizer = _get_punkt_tokenizer(language)
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
return PunktTokenizer(language)
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
self.load_lang(lang)
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
File "/var/www/paperless-ngx/venv/lib/python3.9/site-packages/nltk/data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource [93mpunkt_tab[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt_tab')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt_tab/english/[0m
Searched in:
- PosixPath('/var/www/paperless-ngx/nltk_data')
**********************************************************************
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/www/paperless-ngx/src/documents/tasks.py", line 149, in consume_file
msg = plugin.run()
File "/var/www/paperless-ngx/src/documents/consumer.py", line 733, in run
self._fail(
File "/var/www/paperless-ngx/src/documents/consumer.py", line 304, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: 2024-05 Ppc N.pdf: The following error occurred while storing document 2024-05 Ppc N.pdf after parsing:
**********************************************************************
Resource [93mpunkt_tab[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt_tab')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt_tab/english/[0m
Searched in:
- PosixPath('/var/www/paperless-ngx/nltk_data')
**********************************************************************
Describe the bug
It's the exact same bug as described here https://github.com/paperless-ngx/paperless-ngx/issues/7519, except that I'm on version 2.11.6~ynh1.
Context
Steps to reproduce
Just upload any PDF file to paperless-ngx.
Expected behavior
PDF file should be uploaded successfully without error.
Logs