EdinburghNLP / nematus

Open-Source Neural Machine Translation in Tensorflow
BSD 3-Clause "New" or "Revised" License
797 stars 269 forks source link

nematus/score.py broken because of error in nematus/theano_util.py #48

Closed chozelinek closed 7 years ago

chozelinek commented 7 years ago

Hi there,

I've been testing score.py as in master commit b5469b4320b82fdd838db187117feaa9a2464868 with the following script.

#!/bin/sh

# theano device, in case you do not want to compute on gpu, change it to cpu
# device=gpu
device=cpu

# path to nematus ( https://www.github.com/rsennrich/nematus )
nematus=~/Research/Resources/nematus

## Path to the directory to save corpus data
DATA=..

# path to source files
ST=$DATA/alignments/sentence/mbitexts/word/en_ceb

# SL
SL=en

# TL
TL=es

# path to the target files
TT=$DATA/alignments/sentence/mbitexts/word/es

# path to the output directory
OUTDIR=$DATA/alignments/sentence/nmt_cbe_output

# mkdir OUTDIR
mkdir -p $OUTDIR

## model
MODEL=~/CORPORA/nmt-cristina/model_L1L2w_v80k.npz

for i in $ST/*.txt
do
    echo ${i##*/}
    THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=$device,on_unused_input=warn python $nematus/nematus/score.py \
         -b 80 \
         -v \
         -m $MODEL \
         -s $i \
         -t $TT/${i##*/} \
         -o $OUTDIR/${i##*/}
done

And I got an error whose traceback is as follows:

Traceback (most recent call last):
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 132, in <module>
    args.output, b=args.b, normalization_alpha=args.n, verbose=args.v, alignweights=args.walign)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 106, in main
    rescore_model(source_file, nbest_file, saveto, models, options, b, normalization_alpha, verbose, alignweights)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 35, in rescore_model
    params = load_params(model, param_list)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/theano_util.py", line 72, in load_params
    new_params[with_prefix+kk] = pp[kk].astype(floatX, copy=False)
TypeError: float() argument must be a string or a number

Best!

rsennrich commented 7 years ago

Hello José,

I cannot reproduce this error. Can you please do the following:

this indicates a problem with your theano flags.

Let me know what you find.

best wishes, Rico

On 25/06/17 09:39, José Manuel Martínez Martínez wrote:

Hi there,

I've been testing score.py as in master commit b5469b4 https://github.com/rsennrich/nematus/commit/b5469b4320b82fdd838db187117feaa9a2464868 with the following script.

!/bin/sh

theano device, in case you do not want to compute on gpu, change it

to cpu

device=gpu

device=cpu

path to nematus ( https://www.github.com/rsennrich/nematus )

nematus=~/Research/Resources/nematus

Path to the directory to save corpus data

DATA=..

path to source files

ST=$DATA/alignments/sentence/mbitexts/word/en_ceb

SL

SL=en

TL

TL=es

path to the target files

TT=$DATA/alignments/sentence/mbitexts/word/es

path to the output directory

OUTDIR=$DATA/alignments/sentence/nmt_cbe_output

mkdir OUTDIR

mkdir -p$OUTDIR

model

MODEL=~/CORPORA/nmt-cristina/model_L1L2w_v80k.npz

for i in $ST/.txt do echo ${i##/} THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=$device,on_unused_input=warn python$nematus/nematus/score.py \ -b 80 \ -v \ -m$MODEL \ -s$i \ -t$TT/${i##/} \ -o$OUTDIR/${i##/} done

And I got an error whose traceback is as follows:

|Traceback (most recent call last): File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 132, in args.output, b=args.b, normalization_alpha=args.n, verbose=args.v, alignweights=args.walign) File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 106, in main rescore_model(source_file, nbest_file, saveto, models, options, b, normalization_alpha, verbose, alignweights) File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 35, in rescore_model params = load_params(model, param_list) File "/Users/jmmmac/Research/Resources/nematus/nematus/theano_util.py", line 72, in load_params new_params[with_prefix+kk] = pp[kk].astype(floatX, copy=False) TypeError: float() argument must be a string or a number |

Best!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rsennrich/nematus/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYry2uVDBPi-zrb8L6-0h-tKRyhbWYqks5sHhzagaJpZM4OEjSA.

chozelinek commented 7 years ago

Hi Rico, I'm using Theano 0.9.0. And the result for print 'floatX' float32 is always floatX float32. I'm using Python 2.7.13. And these are the python packages in my virtual environment.

appnope==0.1.0
backports.shutil-get-terminal-size==1.0.0
bottle==0.12.13
bottle-log==1.0.0
configparser==3.5.0
Cython==0.25.2
decorator==4.0.11
enum34==1.1.6
flake8==3.3.0
ipdb==0.10.3
ipython==5.4.1
ipython-genutils==0.2.0
mccabe==0.6.1
nematus==0.2.dev0
numexpr==2.6.2
numpy==1.13.0
Paste==2.0.3
pathlib2==2.3.0
pexpect==4.2.1
pickleshare==0.7.4
prompt-toolkit==1.0.14
ptyprocess==0.5.2
pycodestyle==2.3.1
pyflakes==1.5.0
Pygments==2.2.0
scandir==1.5
scipy==0.19.0
simplegeneric==0.8.1
six==1.10.0
tables==3.4.2
Theano==0.9.0
traitlets==4.3.2
wcwidth==0.1.7

Best!

jmm

bhaddow commented 7 years ago

Hi Jose

Can you a "print kk, pp[kk].shape" statement before the line with the crash, and rerun?

We had a problem before where in some situations a saved model would include a zero-length parameter array, and it caused problems,

cheers - Barry

On 26 June 2017 at 10:35, José Manuel Martínez Martínez < notifications@github.com> wrote:

Hi Rico, I'm using Theano 0.9.0. And the result for print 'floatX' float32 is always floatX float32. I'm using Python 2.7.13. And these are the python packages in my virtual environment.

appnope==0.1.0 backports.shutil-get-terminal-size==1.0.0 bottle==0.12.13 bottle-log==1.0.0 configparser==3.5.0 Cython==0.25.2 decorator==4.0.11 enum34==1.1.6 flake8==3.3.0 ipdb==0.10.3 ipython==5.4.1 ipython-genutils==0.2.0 mccabe==0.6.1 nematus==0.2.dev0 numexpr==2.6.2 numpy==1.13.0 Paste==2.0.3 pathlib2==2.3.0 pexpect==4.2.1 pickleshare==0.7.4 prompt-toolkit==1.0.14 ptyprocess==0.5.2 pycodestyle==2.3.1 pyflakes==1.5.0 Pygments==2.2.0 scandir==1.5 scipy==0.19.0 simplegeneric==0.8.1 six==1.10.0 tables==3.4.2 Theano==0.9.0 traitlets==4.3.2 wcwidth==0.1.7

Best!

jmm

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rsennrich/nematus/issues/48#issuecomment-311010709, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8mG9x7p5RsPhRvpekiGzxdtoolLrEoks5sH3tvgaJpZM4OEjSA .

chozelinek commented 7 years ago

Hi Barry, sure. Below you can find what I got in STDOUT.

ff_logit_W (512, 80000)
Wemb (80000, 512)
encoder_U (1024, 2048)
encoder_W (512, 2048)
encoder_r_Wx (512, 1024)
decoder_U (1024, 2048)
decoder_W (512, 2048)
ff_state_b (1024,)
decoder_b_nl (2048,)
ff_logit_lstm_W (1024, 512)
ff_logit_prev_W (512, 512)
ff_state_W (2048, 1024)
decoder_b (2048,)
decoder_Wc (2048, 2048)
ff_logit_prev_b (512,)
ff_logit_lstm_b (512,)
decoder_b_att (2048,)
decoder_bx_nl (1024,)
Wemb_dec (80000, 512)
decoder_Wcx (2048, 1024)
ff_logit_b (80000,)
decoder_Ux_nl (1024, 1024)
decoder_Ux (1024, 1024)
ff_logit_ctx_b (512,)
encoder_b (2048,)
decoder_bx (1024,)
encoder_r_U (1024, 2048)
encoder_bx (1024,)
ff_logit_ctx_W (2048, 512)
zipped_params ()
Traceback (most recent call last):
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 132, in <module>
    args.output, b=args.b, normalization_alpha=args.n, verbose=args.v, alignweights=args.walign)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 106, in main
    rescore_model(source_file, nbest_file, saveto, models, options, b, normalization_alpha, verbose, alignweights)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 35, in rescore_model
    params = load_params(model, param_list)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/theano_util.py", line 74, in load_params
    new_params[with_prefix+kk] = pp[kk].astype(floatX, copy=False)
TypeError: float() argument must be a string or a number

Best for now! - jmm

bhaddow commented 7 years ago

Hi Jose

OK, so it is zipped_params that is the problem. As a workaround, either:

I thought the "zipped_params" problem was fixed in Nematus, but this model could have been created by an earlier version of Nematus - is that correct?

cheers - Barry

On 26 June 2017 at 12:07, José Manuel Martínez Martínez < notifications@github.com> wrote:

Hi Barry, sure. Below you can find what I got in STDOUT.

ff_logit_W (512, 80000) Wemb (80000, 512) encoder_U (1024, 2048) encoder_W (512, 2048) encoder_r_Wx (512, 1024) decoder_U (1024, 2048) decoder_W (512, 2048) ff_state_b (1024,) decoder_b_nl (2048,) ff_logit_lstm_W (1024, 512) ff_logit_prev_W (512, 512) ff_state_W (2048, 1024) decoder_b (2048,) decoder_Wc (2048, 2048) ff_logit_prev_b (512,) ff_logit_lstm_b (512,) decoder_b_att (2048,) decoder_bx_nl (1024,) Wemb_dec (80000, 512) decoder_Wcx (2048, 1024) ff_logit_b (80000,) decoder_Ux_nl (1024, 1024) decoder_Ux (1024, 1024) ff_logit_ctx_b (512,) encoder_b (2048,) decoder_bx (1024,) encoder_r_U (1024, 2048) encoder_bx (1024,) ff_logit_ctx_W (2048, 512) zipped_params () Traceback (most recent call last): File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 132, in args.output, b=args.b, normalization_alpha=args.n, verbose=args.v, alignweights=args.walign) File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 106, in main rescore_model(source_file, nbest_file, saveto, models, options, b, normalization_alpha, verbose, alignweights) File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 35, in rescore_model params = load_params(model, param_list) File "/Users/jmmmac/Research/Resources/nematus/nematus/theano_util.py", line 74, in load_params new_params[with_prefix+kk] = pp[kk].astype(floatX, copy=False) TypeError: float() argument must be a string or a number

Best for now! - jmm

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rsennrich/nematus/issues/48#issuecomment-311029431, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8mG-LY8xNxLndiRVY9f5dY6Jxob9o_ks5sH5DtgaJpZM4OEjSA .

chozelinek commented 7 years ago

Hi Barry, Thank you! I will try the workarounds. You are right, the model was trained on a version dating from November 2016. Probably before the problem was fixed. I'll let you know if the problem persists. jmm

chozelinek commented 7 years ago

Hi again, I tested the workarounds and they do work! Thank you for the pointer. I have only realised that one got in the terminal how many samples have been processed in the previous version (e.g. 8 samples computed), while now one only gets None. I don't know whether this is important or reproducible. cheers - jmm

Avmb commented 7 years ago

Fixed

bricksdont commented 7 years ago

@chozelinek That's due to our recent introduction of the standard Python logging module for logging. Adding

import logging

should be enough to bring the message back in score.py - I will fix soon.

Best, Mathias

bricksdont commented 7 years ago

@chozelinek In the current master, None should no longer appear and using the -v option of score.py brings back the ... samples computed message.

Regards!

chozelinek commented 7 years ago

Thank you all! I am closing this issue as both problems have been solved. Cheers! José