Want to play with the Moses Statistical Machine Translation system, but...
You don't have time to get a PhD in Setting Up Moses?
You have TMX files (or structured bilingual text files easily convertible to TMX) and want to use them with Moses without doing all the munging yourself?
Well now you don't have to, because I stuffed Moses in a Docker container for you.
A full Moses + MGIZA installation in a Docker image: amake/moses-smt:base
on
Docker Hub
A make
-based set of commands for
easily
Converting TMX files into Moses-ready corpus files: make corpus
Training and tuning Moses: make train
Building Docker images of trained Moses instances: make build
Deploying trained Moses instances to Docker Hub/Amazon Elastic Beanstalk:
make deploy-hub
Some peripheral tools:
mosesxmlrpcrepl.py
or make repl
make
Docker
Python 3 with pip and virtualenv
OS X? (not tested elsewhere)
Some TMX files (Okapi Rainbow is a good tool for converting structured bilingual files to TMX)
First, if trying to build the base image, you might need to re-balance the number of cores vs memory available to Docker: e.g. 8 cores but only 2 GB of memory results in compilation failures. 4 cores with 4 GB seems to work better.
Put most of your TMXs in tmx-train
, and the rest in tmx-tune
.
Run make SOURCE_LANG=<src> TARGET_LANG=<trg> [LABEL=<lbl>]
.
src
and trg
(required) are the language codes (not language + country)
for your source and target languages, e.g. en
and fr
.
lbl
is an optional label for the resulting image; myinstance
by default.
Wait forever.
When done, you will have a Docker image tagged moses-smt:<lbl>-<src>-<trg>
.
Run make server SOURCE_LANG=<src> TARGET_LANG=<trg> [PORT=<port>]
to start
mosesserver
which
you can query over XML-RPC.
Optionally specify a port; the default is 8080
.
Train a new image with swapped languages or with a new set of TMXs.
Use a trained instance for translation in OmegaT with the omegat-moses-mt plugin:
Run make server
to run the server locally; the moses.server.url
value is
then http://localhost:8080/RPC2
Run make deploy-hub
and then upload the .zip that's produced as a new EB
environment