Additionally installed OCR language is rejected by web UI backend

lehnerpat commented 6 months ago

Description After installing an additional OCR language (for example, Japanese) as described in the docs, the additional language can be used in OCR by setting it as the default, but it cannot be used from the web UI because the backend rejects it as an invalid value.

Expected Additionally installed languages should be usable from web UI, just like the default languages.

Actual The additional language shows up in the language selection dropdown for running OCR: CleanShot 2023-12-31 at 17 12 25@2x

But when you click "Start", the backend responds with a 422 error saying the additional language is not an allowed value for the enum.

Additionally, the UI completely ignores this error and doesn't show any error message :(

Full error payload:

{
    "detail": [
        {
            "type": "enum",
            "loc": [
                "body",
                "lang"
            ],
            "msg": "Input should be 'deu','fra','eng','ita','spa','por' or 'ron'",
            "input": "jpn",
            "ctx": {
                "expected": "'deu','fra','eng','ita','spa','por' or 'ron'"
            }
        }
    ]
}

Browser console screenshot: CleanShot 2023-12-31 at 17 12 41@2x

Info:

OS: macOS Sonoma 14.1.2 (23B92), Architecture: Intel (x86_64)
Browser: Safari 17.1.2 (19616.2.9.11.12)
- Database: SQLite
Papermerge Version: 3.0

More info about setup:

Using custom docker image with Japanese language package for tesseract installed, following instructions: https://docs.papermerge.io/3.0/setup/add-ocr-langs/
- Dockerfile:
```
FROM papermerge/papermerge:3.0

# add Japanese OCR language
RUN apt install tesseract-ocr-jpn
```
- Built with: docker build -t mypaper:3.0 -f Dockerfile .
Using Docker Compose, following instructions: https://docs.papermerge.io/3.0/setup/docker-compose/
- Changed image to use my custom one (mypaper:3.0)
- Changed username and password
- Set additional env var PAPERMERGE__OCR__DEFAULT_LANGUAGE: jpn

ciur commented 6 months ago

Thank you for well structured bug report!

The issue happens because currently the language codes are hardcoded:

The fix would be to, well, just extend current set of hardcoded values with another batch of languages (incl. Japanese).

ciur commented 5 months ago

PR#300 to include extra language codes (incl. Japanese)

Pull request was merged and it will available as part of Papermerge 3.0.1 release.

ciur commented 5 months ago

Fixed in 3.0.2

ciur / papermerge

Additionally installed OCR language is rejected by web UI backend #571