bernmic / ocrmypdf-watchdog

A watchdog for OCRMyPDF written in go
GNU General Public License v3.0
11 stars 6 forks source link

ocrmypdf-watchdog did not start watching the input folder #5

Closed schwabenheinz closed 2 years ago

schwabenheinz commented 3 years ago

I am starting to use OCRmyPDF on ubuntu server 20.04. For this I installed the docker container with following parameters: docker run \ -v /home/riedocker/hidrive/public/scans/input:/input \ -v /home/riedocker/hidrive/public/scans/fertig:/output \ -e OCR_OUTPUT_DIRECTORY_YEAR_MONTH=0 \ -e OCR_ON_SUCCESS_DELETE=1 \ -e OCR_DESKEW=1 \ -e ROTATE-PAGES=1 \ -e OUTPUT-TYPE=pdfa \ -e PYTHONUNBUFFERED=1 \ -u root:root \ -it --entrypoint python3 \ jbarlow83/ocrmypdf \ watcher.py

I think important to know is, that the both -v mounts are a linked/mounted to external hidrive-provider I tried to start without -u root:root. Second try was with. In both cases all files are available,. The container runs fine and in the logfiles are only few lines like:

Starting OCRmyPDF watcher with config: Input Directory: /input Output Directory: /output Output Directory Year & Month: False --> inside the container all files are available and everything seems fine.

But nothing happens. No scan startet/ no ocr process startet. Then I added in the config WATCHDOG_FREQUENCY=3600. But also without success. I could not find the reason for it - so every hint would be welcome. Thanks in advance

jxsl13 commented 3 years ago

Hi @schwabenheinz, you may use the docker-compose.yaml as reference for what you expect to do. The problem is that you mount your files at the wrong location. As you can see in the yaml file below, it expects you to mount it at /inand /out. I guess you are trying to provide environment variables for the ocrmypdf base image which I don't know whether the ocrmypdf binary does actually parse environment variables on startup. I'd guess that it does rather expect command line arguments that have to be passed via the OCRMYPDF_PARAMETER environment variable to the watchdog application.

Create a file called docker-compose.yaml.

version: '3'
services:
  ocrmypdf-watchdog:
    container_name: OCRmyPDF
    network_mode: none
    image: darthbermel/ocrmypdf-watchdog:latest
    #build: . # this can be used in the directory with the Dockerfile in order to build the image locally and not to fetch it from dockerhub
    restart: always
    environment:
      OCRMYPDF_IN: /in
      OCRMYPDF_OUT: /out
      WATCHDOG_FREQUENCY: 1
      WATCHDOG_EXTENSIONS: pdf,jpg,jpeg,tif,tiff,png,gif
      OCRMYPDF_BINARY: ocrmypdf
      OCRMYPDF_PARAMETER: -l eng+fra+deu --rotate-pages --deskew --jobs 4 --output-type pdfa
    volumes:
    - /home/riedocker/hidrive/public/scans/input:/in
    - /home/riedocker/hidrive/public/scans/fertig:/out

execute the following command:

docker compose up -d

I also do provide a watchdog application that uses this application's idea but tries the approach where you do not frequently check whether there are any new files in the folder but actually get notified by the file system that there are new files in the folder (GitHub.com/jxsl13/ocrmypdf-watchdog)

jxsl13 commented 3 years ago

The container image you are trying to run is not this project but a completely different one: https://ocrmypdf.readthedocs.io/en/latest/batch.html#watched-folders-with-docker

schwabenheinz commented 3 years ago

Hello @jxsl13 yes you are right. I tried a lot of different way's and also packages, inbetween I mixed it up. I will follow your suggestion and come back after.

schwabenheinz commented 3 years ago

Hello @jxsl13 After I did it right, it works everything like expected Thank you very much for your help and sorry for my confusion!!

One additional question: In your documentation you have in parameter --frequency <in seconds and as environment WATCHDOG_FREQUENCY because of the parameter description I assumed the value for WATCHDOG_FREQUENCY is also in seconds. My intention was to scan each hour = each 3600 seconds the folder for new files. Was it right like I understood it? Thank you again in advance! Greetings from Germany Schwabenheinz

jxsl13 commented 3 years ago

I do not know what documentation you are exactly referring to with In your documentation you have in parameter --frequency....

In this docker image/watchdog application you set the environment variable WATCHDOG_FREQUENCY: 3600 in seconds.

schwabenheinz commented 3 years ago

The base for my question was the README.md document in this project

jxsl13 commented 3 years ago

I see.

jxsl13 commented 2 years ago

can be closed.