Open molokanov50 opened 2 years ago
Hello team,
Based on a trained multilingual Fairseq model (e.g. M2M-100), I run my translations as a service in a Docker container according to the following scheme: as an input for my POST/GET requests, I provide a source text and a language pair, and I get the translated text as an output.
Everything seems to be alright, but every single request consumes additional 30 seconds to load the model (read data from the model's file). The model for all requests is the same
.pt
file. I experimented with variousfairseq-interactive
params combinations and finally I accepted the fact that, without--source-lang
and--target-lang
, I cannot run the commandfairseq-interactive [params]
to read data from the.pt
file. There will appear a message that my tokenizer doesn't know the language to set up. If I also remove--tokenizer
,--encoder-langtok
and--decoder-langtok
fromparams
, then my OS terminal enables me to runfairseq-interactive
, e.g.:fairseq-interactive --path 1.2B_last_checkpoint.pt . --task translation_multi_simple_epoch --lang-pairs language_pairs_small_models.txt --bpe sentencepiece --sentencepiece-model spm.128k.model
but the text which I provide via stdin is translated into a random language. Besides, as far as I can suppose, some quality is lost for automatic source language identification.
So my question is related to the technical opportunity firstly to read the data from my trained multilingual model (in order to load my model only once), and thereafter to provide
--source-lang
and--target-lang
as additionalfairseq-interactive
parameters. There is currently no care about how they will be transferred tofairseq-interactive
- via stdin, POST/GET request, etc. Any ideas?
Can you share your project?
Unfortunately to share my project is impossible. I'll better repeat in a simpler way.
At every fairseq-interactive
query, where I specify --input
, --source-lang
and --target-lang
, I have to wait some time until the data from a .pt file are read to the memory. It is known that fairseq-interactive
has the mode with an empty --input
, which enables a user an opportunity to specify --input
through stdin and perform translations on the fly, without need to re-read the full model to the memory. But my fairseq-based model is multilingual, supporting several lang pairs, so there is a strong desire to specify not only --input
, but also --source-lang
and --target-lang
in order to produce translations on the fly, without spending time for reading the data at every query.
Is it feasible in fairseq?
Your problem is interactive
cannot switch source_lang
and target_lang
during runtime, right?
Well, yes it cannot switch if you are calling fairseq-interactive
.
But you can always copy part of fairseq_cli/interactive.py and write yourself a new method. (well I made one for myself, sorry it is not for multilingual)
Also, look out for translation_multi_simple_epoch .
To start, you can search in both .py
get_interactive_tokens_and_lengths(lines, encode_fn)
inference_step
tokenizer = task.build_tokenizer(cfg.tokenizer)
bpe = task.build_bpe(cfg.bpe)
encode_fn
decode_fn
The basic flow is
generator = task.build_generator(models, cfg.generation)
get_interactive_tokens_and_lengths(lines, encode_fn)
inference_step
this returns a list of sample size lists inside. Then each list has num_beams
python dicts(fairseq's generate output, each dict has things like "tokens":tensor, "score":float, "attention":tensor, ....)hypo
is a list of num_beams
dicts. It is not that hard to identify which codes are useful for you and deleting the redundant one will give you most you need. Hope this helps you.
Hello team,
Based on a trained multilingual Fairseq model (e.g. M2M-100), I run my translations as a service in a Docker container according to the following scheme: as an input for my POST/GET requests, I provide a source text and a language pair, and I get the translated text as an output.
Everything seems to be alright, but every single request consumes additional 30 seconds to load the model (read data from the model's file). The model for all requests is the same
.pt
file. I experimented with variousfairseq-interactive
params combinations and finally I accepted the fact that, without--source-lang
and--target-lang
, I cannot run the commandfairseq-interactive [params]
to read data from the.pt
file. There will appear a message that my tokenizer doesn't know the language to set up. If I also remove--tokenizer
,--encoder-langtok
and--decoder-langtok
fromparams
, then my OS terminal enables me to runfairseq-interactive
, e.g.:fairseq-interactive --path 1.2B_last_checkpoint.pt . --task translation_multi_simple_epoch --lang-pairs language_pairs_small_models.txt --bpe sentencepiece --sentencepiece-model spm.128k.model
but the text which I provide via stdin is translated into a random language. Besides, as far as I can suppose, some quality is lost for automatic source language identification.
So my question is related to the technical opportunity firstly to read the data from my trained multilingual model (in order to load my model only once), and thereafter to provide
--source-lang
and--target-lang
as additionalfairseq-interactive
parameters. There is currently no care about how they will be transferred tofairseq-interactive
- via stdin, POST/GET request, etc. Any ideas?