eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.65k stars 127 forks source link

Jobs don't seem to run #2458

Closed cgranade closed 8 months ago

cgranade commented 10 months ago

When uploading files either via the web interface or via dsc, new jobs for each file are shown in $BASE_URL/app/queue, but seem to be permanently stuck in "waiting." By contrast with https://github.com/eikek/docspell/issues/1692, I've not yet observed a job switching from waiting to running, even when changing the wakeup interval for joex to a much smaller value.

I'm running in Docker (with dsc running on a separate machine), using the following compose file:

version: '3.8'
services:
  restserver:
    image: docspell/restserver:latest
    container_name: docspell-restserver
    restart: unless-stopped
    ports:
      - "7880:7880"
    environment:
      - TZ=America/Los_Angeles
    volumes:
    - type: bind
      source: /devops/docspell/restserver.conf
      target: /config/docspell.conf
      read_only: true
    command:
    - /config/docspell.conf
    depends_on:
      - solr
      - db

  joex:
    image: docspell/joex:latest
    container_name: docspell-joex
    restart: unless-stopped
    command:
    - /config/docspell.conf
    environment:
      - TZ=America/Los_Angeles
    ports:
      - "7878:7878"
    depends_on:
      - solr
      - db
    volumes:
      - type: bind
        source: /devops/docspell/joex.conf
        target: /config/docspell.conf
        read_only: true
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp:/tmp

  db:
    image: postgres:16.1
    container_name: postgres_db
    restart: unless-stopped
    volumes:
      - docspell-postgres_data:/var/lib/postgresql/data/
    environment:
      - POSTGRES_USER=[[REDACTED]]
      - POSTGRES_PASSWORD=[[REDACTED]]
      - POSTGRES_DB=[[REDACTED]]

  solr:
    image: solr:9
    container_name: docspell-solr
    restart: unless-stopped
    volumes:
      - docspell-solr_data:/var/solr
    command:
      - bash
      - -c
      - 'precreate-core docspell; exec solr -f -Dsolr.modules=analysis-extras'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8983/solr/docspell/admin/ping"]
      interval: 1m
      timeout: 10s
      retries: 2
      start_period: 30s

  # Used to make docspell available over
  # tailscale, acting as a reverse proxy from
  # the tailnet onto the ad-hoc network created
  # by docker-compose.
  reverse-proxy:
    image: lscr.io/linuxserver/nginx:latest
    container_name: reverse-proxy
    environment:
    - DOCKER_MODS=ghcr.io/tailscale-dev/docker-mod:main
    - TAILSCALE_HOSTNAME=paper
    - TAILSCALE_STATE_DIR=/var/tailscale
    - TAILSCALE_AUTHKEY=[[REDACTED]]
    - TAILSCALE_SERVE_PORT=80
    - TAILSCALE_SERVE_MODE=https
    - TAILSCALE_USE_SSH=1
    volumes:
    - type: bind
      source: /devops/docspell/nginx.conf
      target: /config/nginx/site-confs/default.conf
      read_only: true
    - docspell-tailscale_data:/var/tailscale

volumes:
  docspell-postgres_data:
  docspell-solr_data:
  docspell-tailscale_data:

Excerpts from restserver.conf, excluding secrets and auth config:

docspell.server {

  # This is shown in the top right corner of the web application
  app-name = "Docspell"

  # This is the id of this node. If you run more than one server, you
  # have to make sure to provide unique ids per node.
  app-id = "rest1"

  base-url = [[REDACTED]]
  internal-url = "http://docspell-restserver:7880"

  # Configures logging
  logging {
    format = "Fancy"
    minimum-level = "Warn"

    # Override the log level of specific loggers
    levels = {
      "docspell" = "Info"
      "org.flywaydb" = "Info"
      "binny" = "Info"
      "org.http4s" = "Info"
    }
  }

  # Where the server binds to.
  bind {
    address = "0.0.0.0"
    port = 7880
  }

  # Options for tuning the http server
  server-options {
    enable-http-2 = false

    # Maximum allowed connections
    max-connections = 1024

    # Timeout for waiting for the first output of the response
    response-timeout = 45s
  }

  max-item-page-size = 200
  max-note-length = 180
  show-classification-settings = true

  # Authentication.
  auth {
    server-secret = [[REDACTED]]

    # How long an authentication token is valid. The web application
    # will get a new one periodically.
    session-valid = "5 minutes"

    remember-me {
      enabled = true
      # How long the remember me cookie/token is valid.
      valid = "30 days"
    }

    on-account-source-conflict = "fail"
  }

  # Settings for "download as zip"
  download-all {
    # How many files to allow in a zip.
    max-files = 500

    # The maximum (uncompressed) size of the zip file contents.
    max-size = 1400M
  }

  openid =
    [ { enabled = true,

        [[REDACTED]]
    ]

  oidc-auto-redirect = true

  integration-endpoint {
    [[REDACTED]]
  }

  admin-endpoint {
    secret = [[REDACTED]]
  }

  # Configuration of the full-text search engine. (the same must be used for joex)
  full-text-search {
    enabled = true

    # Which backend to use, either solr or postgresql
    backend = "solr"

    # Configuration for the SOLR backend.
    solr = {
      # The URL to solr
      url = "http://docspell-solr:8983/solr/docspell"
      # Used to tell solr when to commit the data
      commit-within = 1000
      # If true, logs request and response bodies
      log-verbose = false
      # The defType parameter to lucene that defines the parser to
      # use. You might want to try "edismax" or look here:
      # https://solr.apache.org/guide/8_4/query-syntax-and-parsing.html#query-syntax-and-parsing
      def-type = "lucene"
      # The default combiner for tokens. One of {AND, OR}.
      q-op = "OR"
    }
  }

  # Configuration for the backend.
  backend {

    # Enable or disable debugging for e-mail related functionality. This
    # applies to both sending and receiving mails. For security reasons
    # logging is not very extensive on authentication failures. Setting
    # this to true, results in a lot of data printed to stdout.
    mail-debug = false

    # The database connection.
    jdbc {
      [[REDACTED]]
    }

    # Additional settings related to schema migration.
    database-schema = {
      # Whether to run main database migrations.
      run-main-migrations = true

      # Whether to run the fixup migrations.
      run-fixup-migrations = true

      # Use with care. This repairs all migrations in the database by
      # updating their checksums and removing failed migrations. Good
      # for testing, not recommended for normal operation.
      repair-schema = false
    }

    # Configuration for registering new users.
    signup {
      mode = "open"

      # If mode == 'invite', a password must be provided to generate
      # invitation keys. It must not be empty.
      new-invite-password = ""

      # If mode == 'invite', this is the period an invitation token is
      # considered valid.
      invite-time = "3 days"
    }

    files {
      # Defines the chunk size (in bytes) used to store the files.
      # This will affect the memory footprint when uploading and
      # downloading files. At most this amount is loaded into RAM for
      # down- and uploading.
      #
      # It also defines the chunk size used for the blobs inside the
      # database.
      chunk-size = 524288

      # The file content types that are considered valid. Docspell
      # will only pass these files to processing. The processing code
      # itself has also checks for which files are supported and which
      # not. This affects the uploading part and can be used to
      # restrict file types that should be handed over to processing.
      # By default all files are allowed.
      valid-mime-types = [ ]

      # The id of an enabled store from the `stores` array that should
      # be used.
      #
      # IMPORTANT NOTE: All nodes must have the exact same file store
      # configuration!
      default-store = "database"

      # A list of possible file stores. Each entry must have a unique
      # id. The `type` is one of: default-database, filesystem, s3.
      #
      # The enabled property serves currently to define target stores
      # for te "copy files" task. All stores with enabled=false are
      # removed from the list. The `default-store` must be enabled.
      stores = {
        database =
          { enabled = true
            type = "default-database"
          }

        filesystem =
          { enabled = false
            type = "file-system"
            directory = "/some/directory"
          }

        minio =
         { enabled = false
           type = "s3"
           endpoint = "http://localhost:9000"
           access-key = "username"
           secret-key = "password"
           bucket = "docspell"
         }
      }
    }

    addons = {
      enabled = false

      # Whether installing addons requiring network should be allowed
      # or not.
      allow-impure = true

      # Define patterns of urls that are allowed to install addons
      # from.
      #
      # A pattern is compared against an URL by comparing three parts
      # of an URL via globs: scheme, host and path.
      #
      # You can use '*' (0 or more) and '?' (one) as wildcards in each
      # part. For example:
      #
      #   https://*.mydomain.com/projects/*
      #   *s://gitea.mydomain/*
      #
      # A hostname is separated by dots and the path by a slash. A '*'
      # in a pattern means to match one or more characters. The path
      # pattern is always matching the given prefix. So /a/b/* matches
      # /a/b/c and /a/b/c/d and all other sub-paths.
      #
      # Multiple patterns can be defined va a comma separated string
      # or as an array. An empty string matches no URL, while the
      # special pattern '*' all by itself means to match every URL.
      allowed-urls = "*"

      # Same as `allowed-urls` but a match here means do deny addons
      # from this url.
      denied-urls = ""
    }
  }
}

I've not included the joex.conf file here as it's much longer, but the bind and base URL portions of the file appear to match everything else:

docspell.joex {

  # This is the id of this node. If you run more than one server, you
  # have to make sure to provide unique ids per node.
  app-id = "joex1"

  # This is the base URL this application is deployed to. This is used
  # to register this joex instance such that docspell rest servers can
  # reach them
  base-url = "http://docspell-joex:7878"

  # Where the REST server binds to.
  #
  # JOEX provides a very simple REST interface to inspect its state.
  bind {
    address = "0.0.0.0"
    port = 7878
  }

Very much obliged for the help!

cgranade commented 10 months ago

Ah, my apologies, in the process of writing this issue, I think I noticed one more place where the JDBC URL was accidentally set to the default instead of the one used internally to the docker-compose project. I think this was operator error, apologies again!

eikek commented 10 months ago

Hello @cgranade, no worries! This indeed sounds like the cause. If a job is never picked up (gets into scheduled/running), then joex probably looks at a different database. Hop it's working now?

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!