OpenNMT / CTranslate

Lightweight C++ translator for OpenNMT Torch models (deprecated)
https://opennmt.net/
MIT License
79 stars 50 forks source link

Unable to pipe to translate process in node #20

Closed loretoparisi closed 7 years ago

loretoparisi commented 7 years ago

I'm using node.js with the translate executable that normally would run in pipe like this from the console:

echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --beam_size 5 -
Der <unk> Fuchs springt über den faulen Hund

while in node I'm doing like

    var args =[
            '--model',
            self._options.loadModel,
            '--beam_size',
            self._options.translate.beamSize,
             '-'
        ];
        var child=exec('translate', self._options.bin, args, self._options.child);
         child.stdin.setEncoding('utf-8');
        child.stdin.write( data + '\r\n' );

where my exec method creates a node child process and listens for data, errors, etc. (example):

    var exec = function(name,command,params,options) {
        var self=this;
        var _options = { detached: false };
        for (var attrname in options) { _options[attrname] = options[attrname]; }

        var created_time= ( new Date() ).toISOString();
        var task = require('child_process').spawn(command, params, _options);
        task.stdout.on('data', function(_data) { 
            //...
        });
        task.on('error', function(error) {
           //...
        });
        task.on('uncaughtException', function (error) {
            //...
        });
        task.stdout.on('end', function(data) {
            //...
        });
        task.stderr.on('data', function(data) {
           //...
        });
        task.on('close', (code, signal) => {
            //...
        });
        task.on('exit', function(code) {
            //...
        });
        return task;
    }//exec

This normally works for most of commands (in this case I'm using that for the tokenize/detokenize executable as well, with the same approach:

var args = [
            '--mode',
            self._options.cmd.mode,
            '-'
        ];
        var child=self.exec('tokenize', self._options.bin.tokenize, args, self._options.child);
        child.stdin.setEncoding('utf-8');
        child.stdin.write( data + '\r\n' );

While in the case of translate for some reason the | does not work programmatically. Is the source program reading from stdin and writing to stdout normally?

guillaumekln commented 7 years ago

Is the source program reading from stdin and writing to stdout normally?

Yes. However, it will try to read --batch_size lines unless the special control character EOF is received.

So for your application, you certainly want to set --batch_size 1.

loretoparisi commented 7 years ago

@guillaumekln thank you so much!!!! It perfectly worked now!

[loretoparisi@:mbploreto opennmt]$ node translate.js 
[ '--model',
  '/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
  '--beam_size',
  5,
  '--batch_size',
  '1',
  '-' ]
----data <unk> der <unk> Fuchs über die faulen <unk>

<unk> der <unk> Fuchs über die faulen <unk>
SOURCE (en) "The quick brown fox jumps over the lazy dog" 
DEST (de) "<unk> der <unk> Fuchs über die faulen <unk>\n"
exec:translate end.
exec:translate exit.
task:translate pid:15115 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$ 
loretoparisi commented 7 years ago

@guillaumekln sorry just noted that. When using --batch_size=1 I have a slightly different translation:

source (en): "The quick brown fox jumps over the lazy dog" dest (de) (from bash, params: --beam_size 5): Der <unk> Fuchs springt über den faulen Hund dest (from node script, params: --beam_size 5 --batch_size 1): <unk> der <unk> Fuchs über die faulen <unk>

guillaumekln commented 7 years ago

I think there is something else. Can you reproduce it when directly invoking cli/translate on the command line?

loretoparisi commented 7 years ago

nope, with command line trying different parameters:

[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --beam_size 5 -
Der <unk> Fuchs springt über den faulen Hund
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7
Der <unk> Fuchs springt über den faulen Hund
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --batch_size 1 --beam_size 5
Der <unk> Fuchs springt über den faulen Hund

I always get the same output: Der <unk> Fuchs springt über den faulen Hund.

Programmatically in node I'm passing:

[ '--model',
  '/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
  '--beam_size',
  5,
  '--batch_size',
  1,
  '-' ]

and the input text "The quick brown fox jumps over the lazy dog" + "\r\n".

guillaumekln commented 7 years ago

The command line is the reference so if you are getting another output there is something going on in your application.

guillaumekln commented 7 years ago

+ "\r\n"

This seems to be the issue by the way.

loretoparisi commented 7 years ago

@guillaumekln Yes confirmed!!!

[loretoparisi@:mbploreto opennmt]$ node translate.js 
[ '--model',
  '/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
  '--beam_size',
  5,
  '--batch_size',
  1,
  '-' ]
Der <unk> Fuchs springt über den faulen Hund
SOURCE (en) "The quick brown fox jumps over the lazy dog" 
DEST (de) "Der <unk> Fuchs springt über den faulen Hund\n"
exec:translate end.
exec:translate exit.
task:translate pid:54209 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$

My write function now looks like

/**
     * Send data to child process
     */
    this.send = function(data) {
        this.child.stdin.setEncoding('utf-8');
        this.child.stdin.write( data + '\n' );
    }//send

I also realize that the same happened when doing text summarization, so now it works:

task:translate pid:54209 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$ node textsum.js 
[ '--model',
  '/root/textsum_epoch7_14.69_release.t7',
  '--beam_size',
  10,
  '--batch_size',
  1,
  '-' ]
night never just my bed smell
SOURCE (en) "Last night you were in my room And now my bed sheets smell like you Every day discovering something brand new" 
DEST (-) "night never just my bed smell\n"
exec:translate end.
exec:translate exit.
task:translate pid:54229 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$ 

Thank you.

loretoparisi commented 7 years ago

@guillaumekln Sorry here for all these questions! Prefer to write here, since it's related to the command line and more than a performance question than an issue. I have noticed that iterating over several lines to translate performances decrease as the number of lines grows.

Of course I'm still using --batch_size=1, so my question is: Is the model load at every call in this iteration?

I suppose this since it ends up with a memory leak: (node:61283) Warning: Possible EventEmitter memory leak detected. 11 unpipe listeners added. Use emitter.setMaxListeners() to increase limit , I think due to a OOM issue.

Considering that the number of lines to translate changes every time and I need to keep the translation by line (executing within annode process), how to handle that?

A example. A similar translation task that I'm doing using Facebook Fairseq. In this case, the command line tool loads the model once, then I just send data to the child process stdin and the model executes the beam search, so that there is no OOM in this case.

Thank you.

guillaumekln commented 7 years ago

Is the model load at every call in this iteration?

No. It will only be loaded when cli/translate is started and unloaded when the process dies.

You should be able to achieve the same approach as you described for fairseq. Keep stdin open and write line by line.

loretoparisi commented 7 years ago

@guillaumekln thanks I will try that way!

loretoparisi commented 7 years ago

Thank you, it works as expected!!!

[loretoparisi@:mbploreto opennmt]$ node translate.js 
Module:OpenNMT.en-de of OpenNMT loaded.
[ '--model',
  '/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
  '--beam_size',
  5,
  '--batch_size',
  1,
  '-' ]
<unk>
OpenNMT.load
OpenNMT.translate: translating [0] Ayy, I remember syrup sandwiches and crime allowances
OpenNMT.translate: translating [1] Finesse a nigga with some counterfeits
OpenNMT.translate: translating [2] Parmesan where my accountant lives
<unk> Ich erinnere mich an <unk> und <unk>
OpenNMT.translate: translated [0]
<unk> Ich erinnere mich an <unk> und <unk>

<unk> mit einigen Fälschungen
OpenNMT.translate: translated [1]
<unk> mit einigen Fälschungen

<unk> , wo mein Buchhalter lebt .
OpenNMT.translate: translated [2]
<unk> , wo mein Buchhalter lebt .

OpenNMT.translate: translated:3 
 [ { line: 0,
    source: 'Ayy, I remember syrup sandwiches and crime allowances',
    target: '<unk> Ich erinnere mich an <unk> und <unk>\n' },
  { line: 1,
    source: 'Finesse a nigga with some counterfeits',
    target: '<unk> mit einigen Fälschungen\n' },
  { line: 2,
    source: 'Parmesan where my accountant lives',
    target: '<unk> , wo mein Buchhalter lebt .\n' } ]
OpenNMT.unload
exec:translate end.
exec:translate exit.
task:translate pid:71271 terminated due to receipt of signal:SIGINT