alexbelgium / hassio-addons

My homeassistant addons
MIT License
1.48k stars 213 forks source link

paperless-ng: pikepdf import error #125

Closed X1pheR closed 2 years ago

X1pheR commented 2 years ago

The addon is not working. Installed without any changes it gives an pikepdf import error when you try to upload files. See link at the end which describes the same error.

Also the links to paperless-ng in about actually point to qbittorrent.

Also tried to use another OCR language but doesn't seem to support it. Tried nld and dut.

https://issueexplorer.com/issue/linuxserver/docker-paperless-ng/8

alexbelgium commented 2 years ago

Hello, thanks I pushed a new version that adds the code you linked to that updates pikepdf. Let me know how it goes!

Thanks for reporting the link also I've changed it

X1pheR commented 2 years ago

Nice! Didn't expect it to be picked up this quick 😉 I tested but unfortunately it doesn't work yet. I first tested after upgrading with uploading a document. I got the same error. To be sure I deleted the addon, deleted all data/config files and installed the addon again. Again with uploading a file I got the same error. If case it matters, I'm using the 64-bit version of Home Assistant on a RPi4B. I see that it pulled and used the following docker image: linuxserver/paperless-ng:arm64v8-latest

I checked the addon log and I think pikopdf is not actually being installed/updated. Here is a fragment from the addon log:

Setting up python3.8-dev (3.8.10-0ubuntu1~20.04.2) ...
Setting up g++ (4:9.3.0-1ubuntu2) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
update-alternatives: warning: skip creation of /usr/share/man/man1/c++.1.gz because associated file /usr/share/man/man1/g++.1.gz (of link group c++) doesn't exist
Setting up build-essential (12.8ubuntu1.1) ...
Setting up libalgorithm-diff-xs-perl (0.04-6) ...
Setting up libalgorithm-merge-perl (0.08-3) ...
Setting up python3-dev (3.8.2-0ubuntu2) ...
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
/sbin/ldconfig.real: /usr/local/lib/libjbig2enc.so.0 is not a symbolic link

/var/run/s6/etc/cont-init.d/91-pikepdf.sh: line 9: pip: command not found
... success!
[cont-init.d] 91-pikepdf.sh: exited 0.
[cont-init.d] 92-local_mounts.sh: executing... 
[cont-init.d] 92-local_mounts.sh: exited 0.
[cont-init.d] 92-smb_mounts.sh: executing... 
[cont-init.d] 92-smb_mounts.sh: exited 0.
[cont-init.d] 99-custom-scripts: executing... 
[custom-init] no custom files found exiting...
[cont-init.d] 99-custom-scripts: exited 0.
[cont-init.d] done.
[services.d] starting services

And the error as displayed within paperless-ng under failed tasks in the admin page: pikepdf's extension library failed to import : Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 13, in <module>
from . import _qpdf
ImportError: /usr/local/lib/python3.8/dist-packages/pikepdf/_qpdf.cpython-38-aarch64-linux-gnu.so: undefined symbol: _ZN20QPDFPageObjectHelper16placeFormXObjectE16QPDFObjectHandleRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_9RectangleEbbb

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/django_q/cluster.py", line 432, in worker
res = f(*task["args"], **task["kwargs"])
File "/app/paperless/src/documents/tasks.py", line 74, in consume_file
document = Consumer().try_consume_file(
File "/app/paperless/src/documents/consumer.py", line 248, in try_consume_file
document_parser.parse(self.path, mime_type, self.filename)
File "/app/paperless/src/paperless_tesseract/parsers.py", line 230, in parse
import ocrmypdf
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/helpers.py", line 22, in <module>
import pikepdf
File "/usr/local/lib/python3.8/dist-packages/pikepdf/__init__.py", line 16, in <module>
raise ImportError(_msg) from _e
ImportError: pikepdf's extension library failed to import
alexbelgium commented 2 years ago

Thanks for the very detailed troubleshooting info. I'm not with my system right now so can't test straight away! I see the key error in the new script is "pip: command not found", I'll add pip and push a new version :) probably tomorrow I'll check a bit more if it works :)

alexbelgium commented 2 years ago

I've also cleaned the pikepdf install script so only errors are verbose. Hopefully it will be clearer to read. I don't have access to my computer for the moment but pikepdf seems to update.

X1pheR commented 2 years ago

Nice! I upgraded and tested. It works now! I did see some errors in the addon log however and seem to occur each time you start the addon. The full log is below. I can't determine if anything should be fixed or not.

Suggestion to extend documentation

One small suggestion is to add the TZ addon option. It's not stated. Apparently I can set this to Europe/Amsterdam so that it uses the correct timezone for me ;)

OCR multi language support

I tested some more and apparently only english OCR is included in the package. If I try to change eng to nld in the paperless-ng config file I get the below error while starting the addon;

SystemCheckError: System check identified some issues:
ERRORS:
?: The selected ocr language nld is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.

I tried searching on how to add additional OCR languages and found as below. Can you do anything with this and add (configurable) multilanguage support?

  1. linuxserver seems to have an addon docker that can be used. It would mean adding two variables to the environment to configure; https://github.com/linuxserver/docker-mods/tree/papermerge-multilangocr
  2. paperless-ng seems to use OCRmyPDF. If following the link it mentions additional languages can be installed. Maybe this is an option; https://paperless-ng.readthedocs.io/en/latest/configuration.html#ocr-settings
  3. The OCR language packs are also mentioned in the bare metal install notes; https://paperless-ng.readthedocs.io/en/latest/setup.html#bare-metal-route

Where to find the folders

Sorry for all the questions. Last one ;) The documentation states that paperless-ng users 3 folders. I looked but cannot find them anywhere in the home assistant shares or on the file system. Do you know where things are stored now? https://paperless-ng.readthedocs.io/en/latest/setup.html#bare-metal-route

Output of full addon log still containing some errors

You are running the latest version of this add-on.
 System: Home Assistant OS 7.0  (aarch64 / raspberrypi4-64)
 Home Assistant Core: 2021.12.3
 Home Assistant Supervisor: 2021.12.2
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums
 https://github.com/alexbelgium/hassio-addons
-----------------------------------------------------------
[cont-init.d] 00-banner.sh: exited 0.
[cont-init.d] 01-envfile: executing... 
[cont-init.d] 01-envfile: exited 0.
[cont-init.d] 01-migrations: executing... 
[migrations] started
[migrations] no migrations found
[cont-init.d] 01-migrations: exited 0.
[cont-init.d] 10-adduser: executing... 
-------------------------------------
          _         ()
         | |  ___   _    __
         | | / __| | |  /  \
         | | \__ \ | | | () |
         |_| |___/ |_|  \__/
Brought to you by linuxserver.io
-------------------------------------
To support LSIO projects visit:
https://www.linuxserver.io/donate/
-------------------------------------
GID/UID
-------------------------------------
User uid:    0
User gid:    0
-------------------------------------
[cont-init.d] 10-adduser: exited 0.
[cont-init.d] 50-config: executing... 
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, django_q, documents, paperless_mail, sessions
Running migrations:
  No migrations to apply.
[cont-init.d] 50-config: exited 0.
[cont-init.d] 70-aliases: executing... 
[cont-init.d] 70-aliases: exited 0.
[cont-init.d] 90-config_yaml.sh: executing... 
Using config file found in /config/addons_config/paperless_ng/config.yaml
Config file is a valid yaml
[00:04:56] INFO: Starting the app with the variables in in /config/addons_config/paperless_ng/config.yaml
PAPERLESS_OCR_LANGUAGE=eng
PAPERLESS_OCR_MODE=skip
[cont-init.d] 90-config_yaml.sh: exited 0.
[cont-init.d] 90-custom-folders: executing... 
[cont-init.d] 90-custom-folders: exited 0.
[cont-init.d] 91-pikepdf.sh: executing... 
Installing pikepdf...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/aarch64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/aarch64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/aarch64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/aarch64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7, <> line 76.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
... success!
[cont-init.d] 91-pikepdf.sh: exited 0.
[cont-init.d] 92-local_mounts.sh: executing... 
[cont-init.d] 92-local_mounts.sh: exited 0.
[cont-init.d] 92-smb_mounts.sh: executing... 
[cont-init.d] 92-smb_mounts.sh: exited 0.
[cont-init.d] 99-custom-scripts: executing... 
[custom-init] no custom files found exiting...
[cont-init.d] 99-custom-scripts: exited 0.
[cont-init.d] done.
[services.d] starting services
Waiting for redis to become available...
Waiting for redis to become available...
[services.d] done.
Waiting for redis to become available...
1667:C 22 Dec 2021 00:10:08.733 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1667:C 22 Dec 2021 00:10:08.733 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1667, just started
1667:C 22 Dec 2021 00:10:08.733 # Configuration loaded
1667:M 22 Dec 2021 00:10:08.766 * Running mode=standalone, port=6379.
1667:M 22 Dec 2021 00:10:08.766 # Server initialized
1667:M 22 Dec 2021 00:10:08.767 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1667:M 22 Dec 2021 00:10:08.768 * Ready to accept connections
[2021-12-21 23:10:20 +0000] [1669] [INFO] Starting gunicorn 20.1.0
[2021-12-21 23:10:20 +0000] [1669] [INFO] Listening at: http://0.0.0.0:8000 (1669)
[2021-12-21 23:10:20 +0000] [1669] [INFO] Using worker: paperless.workers.ConfigurableWorker
[2021-12-21 23:10:20 +0000] [1669] [INFO] Server is ready. Spawning workers
[2021-12-21 23:10:24,143] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /data/consume
23:10:24 [Q] INFO Q Cluster blossom-mirror-steak-vegan starting.
23:10:24 [Q] INFO Process-1:1 ready for work at 1723
23:10:24 [Q] INFO Process-1:2 ready for work at 1724
23:10:24 [Q] INFO Process-1:3 monitoring at 1725
23:10:24 [Q] INFO Process-1 guarding cluster blossom-mirror-steak-vegan
23:10:24 [Q] INFO Process-1:4 pushing tasks at 1726
23:10:24 [Q] INFO Q Cluster blossom-mirror-steak-vegan running.
23:10:54 [Q] INFO Enqueued 1
23:10:54 [Q] INFO Process-1 created a task from schedule [Check all e-mail accounts]
23:10:54 [Q] INFO Process-1:1 processing [november-nitrogen-harry-bakerloo]
23:10:54 [Q] INFO Process-1:1 stopped doing work
23:10:54 [Q] INFO Processed [november-nitrogen-harry-bakerloo]
23:10:54 [Q] INFO recycled worker Process-1:1
23:10:54 [Q] INFO Process-1:5 ready for work at 1729
23:12:52 [Q] INFO Enqueued 1
23:12:52 [Q] INFO Process-1:2 processing [Clean Tenso stoomuitlaat verstopt.pdf]
[2021-12-21 23:12:52,953] [INFO] [paperless.consumer] Consuming Clean Tenso stoomuitlaat verstopt.pdf
[2021-12-21 23:13:36,420] [WARNING] [ocrmypdf._validation] The output file size is 1.49× larger than the input file.
Possible reasons for this include:
The argument --deskew was issued, causing transcoding.
PDF/A conversion was enabled. (Try `--output-type pdf`.)
[2021-12-21 23:13:51,492] [INFO] [paperless.consumer] Document 2000-12-15 Clean Tenso stoomuitlaat verstopt consumption finished
23:13:51 [Q] INFO Process-1:2 stopped doing work
23:13:51 [Q] INFO Processed [Clean Tenso stoomuitlaat verstopt.pdf]
23:13:51 [Q] INFO recycled worker Process-1:2
23:13:51 [Q] INFO Process-1:6 ready for work at 1814
alexbelgium commented 2 years ago

Thanks for the detailed informations!

  1. Logs errors

    • [ ] Implemented
    • [ ] Tested Indeed this is just because it installs apps in a headless mode, without supervision. I've pushed the apps install to the dockerfile, it will remove the errors.
  2. Tz

    • [x] Thanks, I'll add that!
  3. Multi ocr

    • [x] Implemented
    • [x] Tested I think it is natively taken into account using the env variables that can be defined in the config.yaml file (see addon options) based on the list here : https://paperless-ng.readthedocs.io/en/latest/configuration.html. Based on what you wrote, I think setting the value is not enough to have it installed. I'll check the linuxserver code, or I'll use my own developed for nextcloud

Actually it seems to be built in according to documentation here : PAPERLESS_OCR_LANGUAGES= https://paperless-ng.readthedocs.io/en/latest/configuration.html

  1. Folders
    • [ ] : I'll complete the post Tomorrow
alexbelgium commented 2 years ago

Language installation confirmed working with fra (post above updated)

X1pheR commented 2 years ago

I checked but the env variable is actually PAPERLESS_OCR_LANGUAGE and not PAPERLESS_OCR_LANGUAGES (plural) 😉 If you remove the S at the end you'll find you'll experience the same errors. I also tried with only nld in there. Same error. But with only eng it doesn't give any errors.

dpkg-preconfigure: unable to re-open stdin: 
... success!
[cont-init.d] 91-pikepdf.sh: exited 0.
[cont-init.d] 92-local_mounts.sh: executing... 
[cont-init.d] 92-local_mounts.sh: exited 0.
[cont-init.d] 92-smb_mounts.sh: executing... 
[cont-init.d] 92-smb_mounts.sh: exited 0.
[cont-init.d] 99-custom-scripts: executing... 
[custom-init] no custom files found exiting...
[cont-init.d] 99-custom-scripts: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
Waiting for redis to become available...
Waiting for redis to become available...
Waiting for redis to become available...
1682:C 22 Dec 2021 15:30:11.349 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1682:C 22 Dec 2021 15:30:11.350 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1682, just started
1682:C 22 Dec 2021 15:30:11.350 # Configuration loaded
1682:M 22 Dec 2021 15:30:11.373 * Running mode=standalone, port=6379.
1682:M 22 Dec 2021 15:30:11.373 # Server initialized
1682:M 22 Dec 2021 15:30:11.373 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1682:M 22 Dec 2021 15:30:11.375 * Ready to accept connections
[2021-12-22 14:30:23 +0000] [1681] [INFO] Starting gunicorn 20.1.0
[2021-12-22 14:30:23 +0000] [1681] [INFO] Listening at: http://0.0.0.0:8000 (1681)
[2021-12-22 14:30:23 +0000] [1681] [INFO] Using worker: paperless.workers.ConfigurableWorker
[2021-12-22 14:30:23 +0000] [1681] [INFO] Server is ready. Spawning workers
SystemCheckError: System check identified some issues:
ERRORS:
?: The selected ocr language eng,nld is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
SystemCheckError: System check identified some issues:
alexbelgium commented 2 years ago

Hi, my instructions were unclear : actually there is 2 steps now :

PAPERLESS_OCR_LANGUAGE This can be a combination of multiple languages such as deu+eng, in which case tesseract will use whatever language matches best. Keep in mind that tesseract uses much more cpu time with multiple languages enabled.

Setting both these options should allow installing and using nld! For the moment I've tested with fra and it worked

X1pheR commented 2 years ago

Ah ok. I did both and started after upgrading but there seems to be an error with pikopdf now. See below. To be sure I deleted and reinstalled the addon also but still the same.

[23:06:41] INFO: Starting the app with the variables in in /config/addons_config/paperless_ng/config.yaml
PAPERLESS_OCR_LANGUAGE=eng+nld
PAPERLESS_OCR_MODE=skip
[cont-init.d] 90-config_yaml.sh: exited 0.
[cont-init.d] 90-custom-folders: executing... 
[cont-init.d] 90-custom-folders: exited 0.
[cont-init.d] 91-pikepdf.sh: executing... 
Installing pikepdf...
Usage:   
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...
no such option: -y
alexbelgium commented 2 years ago

Argh a typo... I've pushed a new version that solves that... The key issue is that I have an rpi3b so really low ram I can't really test the builds before pushing to github :) or it would be in a big week when I'm have access to my computer and amd64 build environment again

X1pheR commented 2 years ago

No problem. I really appreciate your quick responses. I don't mind waiting until you can look at it with your own system. I tested again after upgrading but it still doesn't work. Since you mentioned it works with eng,fra I tried that and also that doesn't seem to work for me.

I set the addon options as you said with eng,fra as OCRLANG: image

And here is the addon log where you see I have eng+fra set in the config. But it seems it cannot find FRA. For ENG I see it being installed but no mention of that for FRA either.

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 00-aaa_dockerfile_backup.sh: executing... 
[cont-init.d] 00-aaa_dockerfile_backup.sh: exited 0.
[cont-init.d] 00-banner.sh: executing... 
-----------------------------------------------------------
 Add-on: Paperless NG
 scan, index and archive all your physical documents
-----------------------------------------------------------
 Add-on version: 1.5.0-6
 You are running the latest version of this add-on.
 System: Home Assistant OS 7.0  (aarch64 / raspberrypi4-64)
 Home Assistant Core: 2021.12.4
 Home Assistant Supervisor: 2021.12.2
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums
 https://github.com/alexbelgium/hassio-addons
-----------------------------------------------------------
[cont-init.d] 00-banner.sh: exited 0.
[cont-init.d] 01-envfile: executing... 
[cont-init.d] 01-envfile: exited 0.
[cont-init.d] 01-migrations: executing... 
[migrations] started
[migrations] no migrations found
[cont-init.d] 01-migrations: exited 0.
[cont-init.d] 10-adduser: executing... 
-------------------------------------
          _         ()
         | |  ___   _    __
         | | / __| | |  /  \
         | | \__ \ | | | () |
         |_| |___/ |_|  \__/
Brought to you by linuxserver.io
-------------------------------------
To support LSIO projects visit:
https://www.linuxserver.io/donate/
-------------------------------------
GID/UID
-------------------------------------
User uid:    0
User gid:    0
-------------------------------------
[cont-init.d] 10-adduser: exited 0.
[cont-init.d] 50-config: executing... 
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, django_q, documents, paperless_mail, sessions
Running migrations:
  No migrations to apply.
[cont-init.d] 50-config: exited 0.
[cont-init.d] 70-aliases: executing... 
[cont-init.d] 70-aliases: exited 0.
[cont-init.d] 90-config_yaml.sh: executing... 
Using config file found in /config/addons_config/paperless_ng/config.yaml
Config file is a valid yaml
[13:06:47] INFO: Starting the app with the variables in in /config/addons_config/paperless_ng/config.yaml
PAPERLESS_OCR_LANGUAGE=eng+fra
PAPERLESS_OCR_MODE=skip
[cont-init.d] 90-config_yaml.sh: exited 0.
[cont-init.d] 90-custom-folders: executing... 
[cont-init.d] 90-custom-folders: exited 0.
[cont-init.d] 91-pikepdf.sh: executing... 
Installing pikepdf...
... success!
[cont-init.d] 91-pikepdf.sh: exited 0.
[cont-init.d] 92-local_mounts.sh: executing... 
[cont-init.d] 92-local_mounts.sh: exited 0.
[cont-init.d] 92-smb_mounts.sh: executing... 
[cont-init.d] 92-smb_mounts.sh: exited 0.
[cont-init.d] 93-multiocr.sh: executing... 
OCRLANG variable is set, processing the language packages
installing tesseract-ocr-eng
Reading package lists...
Building dependency tree...
Reading state information...
tesseract-ocr-eng is already the newest version (1:4.00~git30-7274cfa-1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
/var/run/s6/etc/cont-init.d/93-multiocr.sh: line 14: cd: /usr/share/tessdata: No such file or directory
[cont-init.d] 93-multiocr.sh: exited 1.
[cont-init.d] 99-custom-scripts: executing... 
[custom-init] no custom files found exiting...
[cont-init.d] 99-custom-scripts: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
Waiting for redis to become available...
Waiting for redis to become available...
Waiting for redis to become available...
1169:C 23 Dec 2021 13:11:24.314 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1169:C 23 Dec 2021 13:11:24.314 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1169, just started
1169:C 23 Dec 2021 13:11:24.314 # Configuration loaded
1169:M 23 Dec 2021 13:11:24.338 * Running mode=standalone, port=6379.
1169:M 23 Dec 2021 13:11:24.338 # Server initialized
1169:M 23 Dec 2021 13:11:24.338 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1169:M 23 Dec 2021 13:11:24.339 * Ready to accept connections
[2021-12-23 12:11:36 +0000] [1171] [INFO] Starting gunicorn 20.1.0
[2021-12-23 12:11:36 +0000] [1171] [INFO] Listening at: http://0.0.0.0:8000 (1171)
[2021-12-23 12:11:36 +0000] [1171] [INFO] Using worker: paperless.workers.ConfigurableWorker
[2021-12-23 12:11:36 +0000] [1171] [INFO] Server is ready. Spawning workers
SystemCheckError: System check identified some issues:
ERRORS:
?: The selected ocr language fra is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
SystemCheckError: System check identified some issues:
ERRORS:
?: The selected ocr language fra is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
Waiting for redis to become available...
Waiting for redis to become available...
SystemCheckError: System check identified some issues:
ERRORS:
?: The selected ocr language fra is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
SystemCheckError: System check identified some issues:
ERRORS:
?: The selected ocr language fra is not installed. Paperless cannot OCR your documents without it. Please fix PAPERLESS_OCR_LANGUAGE.
Waiting for redis to become available...
Waiting for redis to become available...
alexbelgium commented 2 years ago

Another typo corrected, I've tried with eng,fra as addon option and it seemed to work, there is not anymore the error message ;) I'll check a bit more in 2 days

[cont-init.d] 93-multiocr.sh: executing... 
OCRLANG variable is set, processing the language packages
[23:32:14] INFO: OCR Language installed : eng
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
[23:32:32] INFO: OCR Language installed : fra
[cont-init.d] 93-multiocr.sh: exited 0.
alexbelgium commented 2 years ago

Hi, had you have time to test it? I've just installed and tested it both fra & import seem to work

X1pheR commented 2 years ago

sorry, due to the holidays I didn't get to it. I've tested and it works! I also tried your papermerge addon and that one also works. still comparing solutions. I can now test which one works best. Do you by any chance also take addon requests? I came across docspell which also seems a good competitor. Somebody even made the config for use with portainer; https://github.com/olaxe/docspell/tree/main/home-assistant

While testing the addons I am still wondering where files actually go. I see mentions in the logs that folders are created but besides the paperless-ng config file I don't see anything else created for paperless-ng nor papermerge in the usual folders; /config, /addons, /share.

alexbelgium commented 2 years ago

Hi, unless I'm mistaken, folders can be modified from the user interface. The data dir, in which files are uploaded, should be in : PAPERLESS_DATA_DIR=/config/addons_config/paperless_ng

I'm not too keen to add new addons if their function overlaps with existing ones. Especially, this one has only an image for arm64 and amd64, it's image has not been updated for 9m on dockerhub. And seeing that the owner went through the trouble of creating a docker compose for HA, I would think it would be easier for him to make an addon directly. Although indeed it does look good :) honestly I'm just using onedrive, it is just so convenient... Or nextcloud to have ocr, mobile apps and a full environnement.

alexbelgium commented 2 years ago

Initial issue solved

X1pheR commented 2 years ago

Thanks I'll test out where files go and such while comparing them all. And you are totally right! I managed to install and configure it so I can compare it with paperless_ng and papermerge. And thanks for mentioning nextcloud! I didn't even know it existed. I'm definitely going to check it out for comparison! Looking through all your addons now and it's like a swiss army package 😁😉 !!! Very handy addons which covers functionality I still want to explore. If there are any issues, should I create a new issue for each independent addon or what is your preference? I'll be trying out a lot of them 🥳

alexbelgium commented 2 years ago

Thanks for the feedback! It's better to create individual issues in github, mentioning each time the addon and full addon log ;) and sometime restarting the addon can help in case the first run creates some needed files

Have fun!