Closed medfreeman closed 3 months ago
Please read the README of the project, we are no longer supporting OpenNMT-py and switching to https://github.com/eole-nlp/eole I suggest you to switch to eole if you intend to get support in the future. The server in eole is not ready yet but future devs will be done there. cheers.
Some multilingual/seq2seq models such as M2M100 (c.f. Generation section in the linked page) require the
bos_token
set to the target language id in the sequencetgt
property. In the case of the translation server, to be able to specify the requested translation language, we the need to directly manipulate the sequencetgt
property prior to translation.But in its current state the server has a disconnection between the sequence
ref
/ref_tok
(which can be manipulated through tokenizers/processors btw) andtgt
string prior to being sent toctranslate2
.c.f. https://github.com/OpenNMT/OpenNMT-py/blob/cb1cb22b3de872434076067d316bff446af683ff/onmt/translate/translation_server.py#L588
Basically the parameter
tgt
of theself.translator.translate
method is never provided.c.f. https://github.com/OpenNMT/OpenNMT-py/blob/cb1cb22b3de872434076067d316bff446af683ff/onmt/translate/translation_server.py#L599
I successfully implemented a one-line patch that properly passes the parameter through and allows me to do multilingual translation. It should not have side-effects on other type of models (for which the sequence
ref
is empty after tokenizing the sequence), by setting the parameter as an empty string in those cases.Here’s the PR: #2585
Example of multilingual translation with a M2M100 model:
conf.json
available_models/m2m-multi4-ft-ck945k/tokenizer/m2m100_tokenizer.py
Sample request to server: