Upload a model only once & translate in any lang pair as much as possible

Hello team,

Based on a trained multilingual Fairseq model (e.g. M2M-100), I run my translations as a service in a Docker container according to the following scheme: as an input for my POST/GET requests, I provide a source text and a language pair, and I get the translated text as an output.

Everything seems to be alright, but every single request consumes additional 30 seconds to load the model (read data from the model's file). The model for all requests is the same .pt file. I experimented with various fairseq-interactive params combinations and finally I accepted the fact that, without --source-lang and --target-lang, I cannot run the command fairseq-interactive [params] to read data from the .pt file. There will appear a message that my tokenizer doesn't know the language to set up. If I also remove --tokenizer, --encoder-langtok and --decoder-langtok from params, then my OS terminal enables me to run fairseq-interactive, e.g.: fairseq-interactive --path 1.2B_last_checkpoint.pt . --task translation_multi_simple_epoch --lang-pairs language_pairs_small_models.txt --bpe sentencepiece --sentencepiece-model spm.128k.model

but the text which I provide via stdin is translated into a random language. Besides, as far as I can suppose, some quality is lost for automatic source language identification.

So my question is related to the technical opportunity firstly to read the data from my trained multilingual model (in order to load my model only once), and thereafter to provide --source-lang and --target-lang as additional fairseq-interactive parameters. There is currently no care about how they will be transferred to fairseq-interactive - via stdin, POST/GET request, etc. Any ideas?

Hello team,

Based on a trained multilingual Fairseq model (e.g. M2M-100), I run my translations as a service in a Docker container according to the following scheme: as an input for my POST/GET requests, I provide a source text and a language pair, and I get the translated text as an output.

Everything seems to be alright, but every single request consumes additional 30 seconds to load the model (read data from the model's file). The model for all requests is the same .pt file. I experimented with various fairseq-interactive params combinations and finally I accepted the fact that, without --source-lang and --target-lang, I cannot run the command fairseq-interactive [params] to read data from the .pt file. There will appear a message that my tokenizer doesn't know the language to set up. If I also remove --tokenizer, --encoder-langtok and --decoder-langtok from params, then my OS terminal enables me to run fairseq-interactive, e.g.: fairseq-interactive --path 1.2B_last_checkpoint.pt . --task translation_multi_simple_epoch --lang-pairs language_pairs_small_models.txt --bpe sentencepiece --sentencepiece-model spm.128k.model

but the text which I provide via stdin is translated into a random language. Besides, as far as I can suppose, some quality is lost for automatic source language identification.

So my question is related to the technical opportunity firstly to read the data from my trained multilingual model (in order to load my model only once), and thereafter to provide --source-lang and --target-lang as additional fairseq-interactive parameters. There is currently no care about how they will be transferred to fairseq-interactive - via stdin, POST/GET request, etc. Any ideas?

Can you share your project?

Unfortunately to share my project is impossible. I'll better repeat in a simpler way. At every fairseq-interactive query, where I specify --input, --source-lang and --target-lang, I have to wait some time until the data from a .pt file are read to the memory. It is known that fairseq-interactive has the mode with an empty --input, which enables a user an opportunity to specify --input through stdin and perform translations on the fly, without need to re-read the full model to the memory. But my fairseq-based model is multilingual, supporting several lang pairs, so there is a strong desire to specify not only --input, but also --source-lang and --target-lang in order to produce translations on the fly, without spending time for reading the data at every query. Is it feasible in fairseq?

Your problem is interactive cannot switch source_lang and target_lang during runtime, right?

Well, yes it cannot switch if you are calling fairseq-interactive. But you can always copy part of fairseq_cli/interactive.py and write yourself a new method. (well I made one for myself, sorry it is not for multilingual) Also, look out for translation_multi_simple_epoch .

To start, you can search in both .py

get_interactive_tokens_and_lengths(lines, encode_fn)
inference_step
tokenizer = task.build_tokenizer(cfg.tokenizer)
bpe = task.build_bpe(cfg.bpe)
encode_fn
decode_fn

The basic flow is

create a generator: generator = task.build_generator(models, cfg.generation)
prepare your data: get_interactive_tokens_and_lengths(lines, encode_fn)
translate: inference_step this returns a list of sample size lists inside. Then each list has num_beams python dicts(fairseq's generate output, each dict has things like "tokens":tensor, "score":float, "attention":tensor, ....)
put numerical output back to normal sentences : uses codes from https://github.com/facebookresearch/fairseq/blob/main/fairseq_cli/interactive.py#L266 the hypo is a list of num_beams dicts.

It is not that hard to identify which codes are useful for you and deleting the redundant one will give you most you need. Hope this helps you.

facebookresearch / fairseq

Upload a model only once & translate in any lang pair as much as possible #4536