Closed loretoparisi closed 7 years ago
Is the source program reading from stdin and writing to stdout normally?
Yes. However, it will try to read --batch_size
lines unless the special control character EOF
is received.
So for your application, you certainly want to set --batch_size 1
.
@guillaumekln thank you so much!!!! It perfectly worked now!
[loretoparisi@:mbploreto opennmt]$ node translate.js
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
'1',
'-' ]
----data <unk> der <unk> Fuchs über die faulen <unk>
<unk> der <unk> Fuchs über die faulen <unk>
SOURCE (en) "The quick brown fox jumps over the lazy dog"
DEST (de) "<unk> der <unk> Fuchs über die faulen <unk>\n"
exec:translate end.
exec:translate exit.
task:translate pid:15115 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$
@guillaumekln sorry just noted that. When using --batch_size=1
I have a slightly different translation:
source (en): "The quick brown fox jumps over the lazy dog"
dest (de) (from bash, params: --beam_size 5
): Der <unk> Fuchs springt über den faulen Hund
dest (from node script, params: --beam_size 5 --batch_size 1
): <unk> der <unk> Fuchs über die faulen <unk>
I think there is something else. Can you reproduce it when directly invoking cli/translate
on the command line?
nope, with command line trying different parameters:
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --beam_size 5 -
Der <unk> Fuchs springt über den faulen Hund
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7
Der <unk> Fuchs springt über den faulen Hund
[loretoparisi@:mbploreto build]$ echo "The quick brown fox jumps over the lazy dog" | ./cli/translate --model /root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7 --batch_size 1 --beam_size 5
Der <unk> Fuchs springt über den faulen Hund
I always get the same output: Der <unk> Fuchs springt über den faulen Hund
.
Programmatically in node
I'm passing:
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
1,
'-' ]
and the input text "The quick brown fox jumps over the lazy dog" + "\r\n"
.
The command line is the reference so if you are getting another output there is something going on in your application.
+ "\r\n"
This seems to be the issue by the way.
@guillaumekln Yes confirmed!!!
[loretoparisi@:mbploreto opennmt]$ node translate.js
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
1,
'-' ]
Der <unk> Fuchs springt über den faulen Hund
SOURCE (en) "The quick brown fox jumps over the lazy dog"
DEST (de) "Der <unk> Fuchs springt über den faulen Hund\n"
exec:translate end.
exec:translate exit.
task:translate pid:54209 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$
My write function now looks like
/**
* Send data to child process
*/
this.send = function(data) {
this.child.stdin.setEncoding('utf-8');
this.child.stdin.write( data + '\n' );
}//send
I also realize that the same happened when doing text summarization, so now it works:
task:translate pid:54209 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$ node textsum.js
[ '--model',
'/root/textsum_epoch7_14.69_release.t7',
'--beam_size',
10,
'--batch_size',
1,
'-' ]
night never just my bed smell
SOURCE (en) "Last night you were in my room And now my bed sheets smell like you Every day discovering something brand new"
DEST (-) "night never just my bed smell\n"
exec:translate end.
exec:translate exit.
task:translate pid:54229 terminated due to receipt of signal:SIGINT
[loretoparisi@:mbploreto opennmt]$
Thank you.
@guillaumekln Sorry here for all these questions! Prefer to write here, since it's related to the command line and more than a performance question than an issue. I have noticed that iterating over several lines to translate performances decrease as the number of lines grows.
Of course I'm still using --batch_size=1
, so my question is: Is the model load at every call in this iteration?
I suppose this since it ends up with a memory leak: (node:61283) Warning: Possible EventEmitter memory leak detected. 11 unpipe listeners added. Use emitter.setMaxListeners() to increase limit
, I think due to a OOM
issue.
Considering that the number of lines to translate changes every time and I need to keep the translation by line (executing within annode
process), how to handle that?
A example.
A similar translation task that I'm doing using Facebook Fairseq. In this case, the command line tool loads the model once, then I just send data to the child process stdin
and the model executes the beam search, so that there is no OOM
in this case.
Thank you.
Is the model load at every call in this iteration?
No. It will only be loaded when cli/translate
is started and unloaded when the process dies.
You should be able to achieve the same approach as you described for fairseq. Keep stdin
open and write line by line.
@guillaumekln thanks I will try that way!
Thank you, it works as expected!!!
[loretoparisi@:mbploreto opennmt]$ node translate.js
Module:OpenNMT.en-de of OpenNMT loaded.
[ '--model',
'/root/onmt_baseline_wmt15-all.en-de_epoch13_7.19_release.t7',
'--beam_size',
5,
'--batch_size',
1,
'-' ]
<unk>
OpenNMT.load
OpenNMT.translate: translating [0] Ayy, I remember syrup sandwiches and crime allowances
OpenNMT.translate: translating [1] Finesse a nigga with some counterfeits
OpenNMT.translate: translating [2] Parmesan where my accountant lives
<unk> Ich erinnere mich an <unk> und <unk>
OpenNMT.translate: translated [0]
<unk> Ich erinnere mich an <unk> und <unk>
<unk> mit einigen Fälschungen
OpenNMT.translate: translated [1]
<unk> mit einigen Fälschungen
<unk> , wo mein Buchhalter lebt .
OpenNMT.translate: translated [2]
<unk> , wo mein Buchhalter lebt .
OpenNMT.translate: translated:3
[ { line: 0,
source: 'Ayy, I remember syrup sandwiches and crime allowances',
target: '<unk> Ich erinnere mich an <unk> und <unk>\n' },
{ line: 1,
source: 'Finesse a nigga with some counterfeits',
target: '<unk> mit einigen Fälschungen\n' },
{ line: 2,
source: 'Parmesan where my accountant lives',
target: '<unk> , wo mein Buchhalter lebt .\n' } ]
OpenNMT.unload
exec:translate end.
exec:translate exit.
task:translate pid:71271 terminated due to receipt of signal:SIGINT
I'm using
node.js
with thetranslate
executable that normally would run in pipe like this from the console:while in
node
I'm doing likewhere my
exec
method creates a node child process and listens for data, errors, etc. (example):This normally works for most of commands (in this case I'm using that for the
tokenize/detokenize
executable as well, with the same approach:While in the case of
translate
for some reason the|
does not work programmatically. Is the source program reading fromstdin
and writing tostdout
normally?