Erikvl87 / docker-languagetool

Dockerfile for LanguageTool server - configurable
https://hub.docker.com/r/erikvl87/languagetool
GNU Lesser General Public License v2.1
453 stars 57 forks source link

Crash using the ARM64 build with ngram configuration #27

Open Erikvl87 opened 2 years ago

Erikvl87 commented 2 years ago

Originally posted by @anetschka in https://github.com/Erikvl87/docker-languagetool/issues/15#issuecomment-974275794

Hi there, I have tried building the image with the supplied dockerfile and and arm64-workaround, using docker-compose. The only difference is that I added ngram data to a folder within my image. The image builds and the container starts up normally, but when I actually post (German) text to LanguageTool, it seems that Hunspell is still not properly initialised:

languagetoolservice_1 | java.io.IOException: Read-only file system languagetoolservice_1 | at java.base/java.io.UnixFileSystem.createFileExclusively(Native Method) languagetoolservice_1 | at java.base/java.io.File.createTempFile(File.java:2129) languagetoolservice_1 | at java.base/java.io.File.createTempFile(File.java:2175) languagetoolservice_1 | at org.bridj.Platform.createTempDir(Platform.java:710) languagetoolservice_1 | at org.bridj.Platform.<clinit>(Platform.java:227) languagetoolservice_1 | at org.bridj.Pointer.<clinit>(Pointer.java:208) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.DumontsHunspellDictionary.<init>(DumontsHunspellDictionary.java:37) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.Hunspell.getDictionary(Hunspell.java:50) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.init(HunspellRule.java:488) languagetoolservice_1 | at org.languagetool.rules.de.GermanSpellerRule.init(GermanSpellerRule.java:1244) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.ensureInitialized(HunspellRule.java:462) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.match(HunspellRule.java:155) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:1295) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1846) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1765) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1736) languagetoolservice_1 | at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1226) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkInternal(JLanguageTool.java:970) languagetoolservice_1 | at org.languagetool.JLanguageTool.check2(JLanguageTool.java:908) languagetoolservice_1 | at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:762) languagetoolservice_1 | at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:711) languagetoolservice_1 | at org.languagetool.server.TextChecker.access$000(TextChecker.java:56) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:427) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:420) languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) languagetoolservice_1 | at java.base/java.lang.Thread.run(Thread.java:829) languagetoolservice_1 | 2021-11-19 17:11:26.422 +0000 ERROR org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent>, detected: de-DE', sending HTTP code 500. Access from 192.168.208.3, HTTP user agent: Python-urllib/3.8, User agent param: null, Referrer: null, language: de-DE, h: 1, r: 1, time: 5494text length: 59, m: ALL, l: DEFAULT, Stacktrace follows:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent>, detected: de-DE languagetoolservice_1 | at org.languagetool.server.TextChecker.checkText(TextChecker.java:457) languagetoolservice_1 | at org.languagetool.server.ApiV2.handleCheckRequest(ApiV2.java:162) languagetoolservice_1 | at org.languagetool.server.ApiV2.handleRequest(ApiV2.java:76) languagetoolservice_1 | at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:182) languagetoolservice_1 | at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77) languagetoolservice_1 | at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82) languagetoolservice_1 | at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80) languagetoolservice_1 | at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692) languagetoolservice_1 | at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77) languagetoolservice_1 | at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) languagetoolservice_1 | at java.base/java.lang.Thread.run(Thread.java:829) languagetoolservice_1 | Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent> languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) languagetoolservice_1 | at org.languagetool.server.TextChecker.checkText(TextChecker.java:438) languagetoolservice_1 | ... 12 more languagetoolservice_1 | Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent> languagetoolservice_1 | at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1230) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkInternal(JLanguageTool.java:970) languagetoolservice_1 | at org.languagetool.JLanguageTool.check2(JLanguageTool.java:908) languagetoolservice_1 | at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:762) languagetoolservice_1 | at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:711) languagetoolservice_1 | at org.languagetool.server.TextChecker.access$000(TextChecker.java:56) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:427) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:420) languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) languagetoolservice_1 | ... 3 more languagetoolservice_1 | Caused by: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent> languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1883) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1765) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1736) languagetoolservice_1 | at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1226) languagetoolservice_1 | ... 11 more languagetoolservice_1 | Caused by: java.lang.RuntimeException: Could not create hunspell instance. Please note that LanguageTool supports only 64-bit platforms (Linux, Windows, Mac) and that it requires a 64-bit JVM (Java). languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.DumontsHunspellDictionary.<init>(DumontsHunspellDictionary.java:45) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.Hunspell.getDictionary(Hunspell.java:50) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.init(HunspellRule.java:488) languagetoolservice_1 | at org.languagetool.rules.de.GermanSpellerRule.init(GermanSpellerRule.java:1244) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.ensureInitialized(HunspellRule.java:462) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.match(HunspellRule.java:155) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:1295) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1846) languagetoolservice_1 | ... 14 more languagetoolservice_1 | Caused by: java.lang.UnsatisfiedLinkError: 'int org.bridj.Platform.sizeOf_ptrdiff_t()' languagetoolservice_1 | at org.bridj.Platform.sizeOf_ptrdiff_t(Native Method) languagetoolservice_1 | at org.bridj.Platform.<clinit>(Platform.java:232) languagetoolservice_1 | at org.bridj.Pointer.<clinit>(Pointer.java:208) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.DumontsHunspellDictionary.<init>(DumontsHunspellDictionary.java:37) languagetoolservice_1 | ... 21 more languagetoolservice_1 |

anetschka commented 2 years ago

Hi @Erikvl87, I don't think that adding the ngrams and changing the output port (which I also did) results in this mistake. I actually tested it by removing the --languagemodel option on LanguageTool's startup command and the error is still the same. I also noticed that the current setup does not include Python which renders me, for instance, unable to run unittests inside the container. I tried adding it, but to no avail.

Erikvl87 commented 2 years ago

Hi @anetschka, since I couldn't reproduce your issue, could you let me know on what type of device you are running the image? What OS is installed? How does your Dockerfile / docker-compose.yml file look like and with what arguments do you start it?

I've just configured my Raspberry Pi 4, 8gb, running the 64 bit version of Ubuntu 21.04 with a docker-compose.yml file with the following contents:

version: "3"

services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool
    ports:
        - 8010:8010  # Using default port from the image
    environment:
        - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
        - Java_Xms=512m  # OPTIONAL: Setting a minimal Java heap size of 512 mib
        - Java_Xmx=1g  # OPTIONAL: Setting a maximum Java heap size of 1 Gib
    volumes:
        - ./ngrams:/ngrams

My local ngrams folder contains the contents of the German ngrams found here: https://languagetool.org/download/ngram-data/ngrams-de-20150819.zip

I've ran it with the docker-compose up command and executed the following request:

curl --location --request GET 'http://192.168.1.186:8010/v2/check?language=de-DE&text=In den christlichen Traditionen gibt es unterschiedliche Anleitungen zur Mediation und Kontemplation.'

Note: The text is taken from step 5 at https://dev.languagetool.org/finding-errors-using-n-gram-data.html

The response is:

{
    "software": {
        "name": "LanguageTool",
        "version": "5.5",
        "buildDate": "2021-10-16 14:46:22 +0000",
        "apiVersion": 1,
        "premium": false,
        "premiumHint": "You might be missing errors only the Premium version can find. Contact us at support<at>languagetoolplus.com.",
        "status": ""
    },
    "warnings": {
        "incompleteResults": false
    },
    "language": {
        "name": "German (Germany)",
        "code": "de-DE",
        "detectedLanguage": {
            "name": "German (Germany)",
            "code": "de-DE",
            "confidence": 0.9999957
        }
    },
    "matches": [
        {
            "message": "‚Mediation‘ (Verfahren zur Konfliktlösung) erscheint hier weniger wahrscheinlich als ‚Meditation‘ (spirituelle Übung).",
            "shortMessage": "Mögliche Wortverwechselung",
            "replacements": [
                {
                    "value": "Meditation",
                    "shortDescription": "spirituelle Übung"
                }
            ],
            "offset": 73,
            "length": 9,
            "context": {
                "text": "...ibt es unterschiedliche Anleitungen zur Mediation und Kontemplation.",
                "offset": 43,
                "length": 9
            },
            "sentence": "In den christlichen Traditionen gibt es unterschiedliche Anleitungen zur Mediation und Kontemplation.",
            "type": {
                "typeName": "Other"
            },
            "rule": {
                "id": "CONFUSION_RULE_MEDIATION_MEDITATION",
                "description": "Mögliche Verwechselungen zwischen 'Mediation' und 'Meditation' erkennen",
                "issueType": "non-conformance",
                "category": {
                    "id": "TYPOS",
                    "name": "Mögliche Tippfehler"
                }
            },
            "ignoreForIncompleteSentence": false,
            "contextForSureMatch": 3
        }
    ]
}

I also noticed that the current setup does not include Python which renders me, for instance, unable to run unittests inside the container. I tried adding it, but to no avail.

You can use this image as a base image in a new Dockerfile and then install Python:

FROM erikvl87/languagetool

USER root
RUN apk update && apk add python3 py3-pip
USER languagetool

You can start the new Dockerfile by executing the following 2 commands in the directory of the Dockerfile:

docker build -t languagetool-custom .
docker run --rm -it -p 8010:8010 languagetool-custom
anetschka commented 2 years ago

Hi @Erikvl87, sorry for my late reply. I have tried using your Dockerfile with and without the "workaround". I am running Linux containers and locally I use docker on a Windows desktop computer, so the "workaround" might actually not be the right option for me. I have now returned to my old LT+docker setup which works fine, however, I am ready to reproduce the steps I made to help you debug. It's possible that the mistake is on my side. The relevant part of my docker-compose looks like this:

languagetoolservice:
    image: languagetoolservice
    read_only: true
    build:
      context: languagetool
    restart: unless-stopped
    init: true
    tmpfs:
      - /var/nobody_tmp:mode=770,size=10M,uid=65534,gid=65534,exec
    ports:
      - 127.0.0.1:8010:8000
    cap_drop:
      - all
    networks:
      - languagetool-net
    depends_on:
      - _ngrams

_ngrams is a volume from which the ngrams are copied into the container at build time. As said above, the container builds and starts up normally, the error is encountered only at runtime.

Maxl94 commented 2 years ago

I try to run the language tool server on my Raspberry Pi 3b+ with the following docker-compose.yaml:

version: "3"

services:
  languagetool:
    image: erikvl87/languagetool
    container_name: LanguageTool
    restart: always
    ports:
      - 8010:8010  # Using default port from the image
    environment:
      - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
      - Java_Xms=512m  # OPTIONAL: Setting a minimal Java heap size of 512 mib
      - Java_Xmx=1g    # OPTIONAL: Setting a maximum Java heap size of 1 Gib
    volumes:
      - ./ngrams:/ngrams

I get the following error, I am not sure if this is related to the original arm issue:

Pulling languagetool (erikvl87/languagetool:)...
latest: Pulling from erikvl87/languagetool
ERROR: no matching manifest for linux/arm/v7 in the manifest list entries

I also tried, building the image from source, but failed with the following issue:

[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile (default) on project languagetool-core: Unable to resolve artifact: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:linux-arm_32:3.17.3
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.17.3 -Dclassifier=linux-arm_32 -Dpackaging=exe -Dfile=/path/to/file
[ERROR] 
[ERROR]   Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]       mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.17.3 -Dclassifier=linux-arm_32 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR]   Path to dependency: 
[ERROR]     1) org.languagetool:languagetool-core:jar:5.7
[ERROR]     2) com.google.protobuf:protoc:exe:linux-arm_32:3.17.3
[ERROR] 
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR] 
[ERROR] for artifact: 
[ERROR]   org.languagetool:languagetool-core:jar:5.7
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)
[ERROR] 
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :languagetool-core
The command 'mvn --projects languagetool-standalone --also-make package -DskipTests --quiet' returned a non-zero code: 1
ERROR: Service 'languagetool' failed to build : Build failed

Any ideas how to get it running? Thanks in advance!

Erikvl87 commented 2 years ago

@Maxl94 I think your issue is unrelated to this ticket. If I'm not mistaken, linux/arm/v7 is a 32bit architecture. Currently, I've only released Docker images for linux/amd64 and linux/arm64 (see Docker Hub tags).

I am not sure, but Raspberry Pi 3b+ should have a 64-bit SoC. Have you tried running a 64bit OS? See Raspbery Pi OS (64-bit)

If you need this to work on a different architecture (32 bit), please open up a new ticket so I could try and look into that.