Open Aculeasis opened 3 years ago
I'm working my way through this and haven't gotten all the way there yet, but I did resolve the "No such file or directory" issue. The fasttext
binary has to be built on alpine linux to work. I'll post my completed setup when I get it working. Now, I'm getting java.lang.OutOfMemoryError
loading the ngram data for language identification.
If you create a Dockerfile
in an empty folder with these contents:
FROM alpine as ftbuild
RUN apk update && apk add \
build-base \
wget \
git \
unzip \
&& rm -rf /var/cache/apk/*
RUN git clone https://github.com/facebookresearch/fastText.git /tmp/fastText && \
rm -rf /tmp/fastText/.git* && \
mv /tmp/fastText/* / && \
cd / && \
make
RUN wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
RUN wget https://languagetool.org/download/ngram-lang-detect/model_ml50_new.zip
FROM erikvl87/languagetool
COPY --chown=languagetool --from=ftbuild /fasttext .
COPY --chown=languagetool --from=ftbuild /model_ml50_new.zip .
COPY --chown=languagetool --from=ftbuild /lid.176.bin .
ENV Java_Xms=512m
ENV Java_Xmx=1500m
ENV langtool_fasttextBinary=/LanguageTool/fasttext
ENV langtool_ngramLangIdentData=/LanguageTool/model_ml50_new.zip
ENV langtool_fasttextModel=/LanguageTool/lid.176.bin
You can then build it with:
docker build -t docker-languagetool-fasttext .
And then you would run it like so (this is based off your command you provided above):
docker run -d --name="Languagetool" \
-p 8081:8010/tcp \
-e langtool_languageModel=/ngrams \
-v "/mnt/hdd1/languagetool/ngrams":"/ngrams" \
--restart=unless-stopped \
docker-languagetool-fasttext
Yes, it starts and i have the same problem with java.lang.OutOfMemoryError
:
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.FST.<init>(FST.java:387)
at org.apache.lucene.util.fst.FST.<init>(FST.java:313)
at org.apache.lucene.codecs.blocktree.FieldReader.<init>(FieldReader.java:91)
at org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.<init>(BlockTreeTermsReader.java:231)
at org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat.fieldsProducer(Lucene50PostingsFormat.java:446)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:261)
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:341)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:104)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:65)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:731)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:241)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel$LuceneSearcher.<init>(LuceneSingleIndexLanguageModel.java:229)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.getCachedLuceneSearcher(LuceneSingleIndexLanguageModel.java:182)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.addIndex(LuceneSingleIndexLanguageModel.java:118)
at org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.<init>(LuceneSingleIndexLanguageModel.java:95)
at org.languagetool.languagemodel.LuceneLanguageModel.<init>(LuceneLanguageModel.java:65)
at org.languagetool.Language.initLanguageModel(Language.java:180)
at org.languagetool.language.English.getLanguageModel(English.java:144)
at org.languagetool.JLanguageTool.activateLanguageModelRules(JLanguageTool.java:566)
at org.languagetool.server.Pipeline.activateLanguageModelRules(Pipeline.java:121)
at org.languagetool.server.PipelinePool.createPipeline(PipelinePool.java:204)
at org.languagetool.server.PipelinePool.getPipeline(PipelinePool.java:180)
at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:757)
at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:711)
at org.languagetool.server.TextChecker.access$000(TextChecker.java:56)
at org.languagetool.server.TextChecker$1.call(TextChecker.java:427)
at org.languagetool.server.TextChecker$1.call(TextChecker.java:420)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Sorry that I've kept you waiting. I unfortunately didn't had the time yet to look into this. I'll do my best to take a look soon. Meanwhile, would the provided solution of @dprothero work in combination with increasing the memory options?
You can do this by increasing the Java_Xms
and Java_Xmx
variables. In the Dockerfile example given above, that means increasing these lines (e.g. to 1g
and 2g
respectively):
ENV Java_Xms=512m
ENV Java_Xmx=1500m
Alternatively, take a look at the Java heap size settings explained over here: https://github.com/Erikvl87/docker-languagetool#java-heap-size
@Aculeasis, The provided solution of @dprothero seems to work here as well.
I think the example above is useful to include in the README.md
so I will keep this ticket open until I've updated the readme file.
Sorry for delay.
I set 1g
and 2g
. It works but falls sometimes.
So, I set 2 and 4 it works well. But, 4 GB is it not too much?
@Aculeasis That should be a question for the official LanguageTool developers. From what I could find is that they don't have an official set of requirements regarding memory configuration:
There's no general rule, it depends on the number of languages being used, the concurrent requests, the text length etc. 2600MB should be enough for most use cases, if you don't have that much, try with less and see how that works.
Source: https://github.com/languagetool-org/languagetool/issues/902#issuecomment-366427622
Is there a reason this can't be included in the docker image?
erikvl87/languagetool:5.2 works fine:
But newer versions already crash :(
I built fasttext from here and downloaded, probably, lid.176.bin from here. My docker runner:
docker version:
So, what am I doing wrong?