Parsing without character embeddings: dimensions mismatch

byewokko commented 5 years ago

I have successfully trained a parser with the options --char-emb-size 0 --pos-emb-size 5. However, when I try to run it with the --predict option I get the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Dimensions of lookup parameter /_1 lookup up from file ({0,20}) do not match parameters to be populated ({5,20})
Word-level LSTM input size: 105
Loading model from /usit/abel/u1/rohr5997/nobackup/out/parser/cs_acc/barchybrid.model
/cluster/software/VERSIONS/dynet/2.0/bin/dynet_python: line 16: 13438 Aborted
                 (core dumped) singularity run -B /cluster:/cluster -B /projects:/projects -B /work:/work /cluster/software/VERSIONS/dynet/2.0/dynet.img "$@"

Is it that the parser fails to load the saved POS embeddings in the predict mode?

mdelhoneux commented 5 years ago

Can you send your full commands? I have not managed to reproduce the error yet. It looks to me like your model did not train with POS embeddings. You have a lookup parameter with dimensions {0,20} which looks to me like it is the POS lookup and you try to look up embeddings of size 5. (20 is the tag set size).

byewokko commented 5 years ago

This is the full command:

dynet_python src/parser.py --dynet-seed 123456789 --dynet-mem 25000 \
       --predict \
       --outdir $outdir \
       --outprefix test-deaccented \
       --model /usit/abel/u1/rohr5997/nobackup/out/parser/$lang/barchybrid.model \
       --char-emb-size 0 --pos-emb-size 5 \
       --testfile /usit/abel/u1/rohr5997/nobackup/data/acc-test/test-deaccented.conllu

I also tried changing the pos-emb-size to other values including 0, or omitting the argument at all, but the error message stays the same, including the numbers.

mdelhoneux commented 5 years ago

hmm it surprises me that you get this error when omitting the arguments because then it should use the same parameters as used in training. My current guess is that the parser is reading the wrong parameter file. I see that you specify the model file but not the parameter file. By default, the parser will look for a params.pickle file in modeldir and if you do not specify --modeldir, it will define modeldir as --outdir. (This is not optimal design I admit!) Can you try:

dynet_python src/parser.py --dynet-seed 123456789 --dynet-mem 25000 \
       --predict \
       --outdir $outdir \
       --outprefix test-deaccented \
       --model /usit/abel/u1/rohr5997/nobackup/out/parser/$lang/barchybrid.model \
       --params /usit/abel/u1/rohr5997/nobackup/out/parser/$lang/params.pickle \
       --testfile /usit/abel/u1/rohr5997/nobackup/data/acc-test/test-deaccented.conllu

or

dynet_python src/parser.py --dynet-seed 123456789 --dynet-mem 25000 \
       --predict \
       --outdir $outdir \
       --outprefix test-deaccented \
       --modeldir /usit/abel/u1/rohr5997/nobackup/out/parser/$lang\
       --testfile /usit/abel/u1/rohr5997/nobackup/data/acc-test/test-deaccented.conllu

(This assumes that you have the params.pickle file in /usit/abel/u1/rohr5997/nobackup/out/parser/$lang of course)

byewokko commented 5 years ago

I tried both and both work well. Thanks for help!

mdelhoneux commented 5 years ago

Ok great!

UppsalaNLP / uuparser

Parsing without character embeddings: dimensions mismatch #4