Closed GoogleCodeExporter closed 9 years ago
Original comment by bojune...@gmail.com
on 4 Dec 2008 at 8:00
Hi alumae,
Does the development set corpus dev.txt exist in the current directory? The
stack
trace and code suggest that dev.txt does not exist. The development set corpus
is
used to tune the interpolation parameters.
Paul
Original comment by bojune...@gmail.com
on 8 Dec 2008 at 4:00
Yes, dev.txt exists:
$ ~/lbin/mitlm-svn/interpolate-ngram -l tmp.mitlm tmp2.mitlm --write-lm
tmp3.arpa.gz
--optimize-perplexity dev.txt Loading component LM tmp.mitlm...
Loading component LM tmp2.mitlm...
Interpolating component LMs...
Interpolation Method = LI
Loading development set dev.txt...
Segmentation fault
$ wc dev.txt
4228 72175 434012 dev.txt
$ head -2 dev.txt
bonjour {breath}
investiture aujourd'hui à Bamako Mali ...
$ gdb -c core.32543 ~/lbin/mitlm-svn/interpolate-ngram
[...]
(gdb) bt
#0 0x00000000004481f1 in PerplexityOptimizer::LoadCorpus (this=0x7fffefd8a8d0,
corpusFile=Variable "corpusFile" is not available.
) at src/util/FastIO.h:54
#1 0x000000000047a4c6 in main (argc=8, argv=0x7fffefd8b1a8) at
src/interpolate-ngram.cpp:270
Does it work for you?
Original comment by alu...@gmail.com
on 8 Dec 2008 at 11:04
BTW, if dev.txt didn't exist, I would get different error:
~/lbin/mitlm-svn/interpolate-ngram -l tmp.mitlm tmp2.mitlm --write-lm
tmp3.arpa.gz
--optimize-perplexity foooo.txt
Loading component LM tmp.mitlm...
Loading component LM tmp2.mitlm...
Interpolating component LMs...
Interpolation Method = LI
Loading development set foooo.txt...
terminate called after throwing an instance of 'std::runtime_error'
what(): Cannot open file
Aborted (core dumped)
Original comment by alu...@gmail.com
on 8 Dec 2008 at 11:08
I am having a bit of difficulty reproducing this. It works with my data files.
If
possible, can you please send me your data files so I can try to reproduce
this?
Also, can you try getting the stack trace with a debug build? Thanks.
make clean
make DEBUG=1
Original comment by bojune...@gmail.com
on 8 Dec 2008 at 4:20
With DEBUG=1, I get the following error:
$ ~/lbin/mitlm-svn/interpolate-ngram -l tmp1.mitlm tmp2.mitlm --write-lm
tmp3.arpa.gz --optimize-perplexity dev.txt
Loading component LM tmp1.mitlm...
Loading component LM tmp2.mitlm...
Interpolating component LMs...
interpolate-ngram: src/vector/VectorOps.h:348: void MaskAssign(const Vector<I>&,
const Vector<R>&, Vector<F>&) [with M = VectorClosure<OpEqual,
DenseVector<double>,
Scalar<int> >, I = VectorClosure<OpMult,
IndirectVectorClosure<DenseVector<double>,
DenseVector<int> >, IndirectVectorClosure<DenseVector<double>, DenseVector<int>
> >,
O = DenseVector<double>]: Assertion `mask.impl().length() ==
input.impl().length()'
failed.
Aborted (core dumped)
Backtrace from gdb:
(gdb) bt
#0 0x00000035c102ee25 in raise () from /lib64/libc.so.6
#1 0x00000035c1030770 in abort () from /lib64/libc.so.6
#2 0x00000035c1028616 in __assert_fail () from /lib64/libc.so.6
#3 0x000000000042d769 in MaskAssign<VectorClosure<OpEqual, DenseVector<double>,
Scalar<int> >, VectorClosure<OpMult, IndirectVectorClosure<DenseVector<double>,
DenseVector<int> >, IndirectVectorClosure<DenseVector<double>, DenseVector<int>
> >,
DenseVector<double> > (mask=@0x7fff95524c80, input=@0x7fff95524c20,
output=@0x5b29b0)
at src/vector/VectorOps.h:348
#4 0x00000000004260af in NgramLMBase::SetModel (this=0x5b3ac0,
m=@0x7fff95525038,
vocabMap=@0x7fff95524d40, ngramMap=@0x7fff95524d80) at src/NgramLM.cpp:129
#5 0x000000000043256d in InterpolatedNgramLM::LoadLMs (this=0x7fff95525030,
lms=@0x7fff95525390) at src/InterpolatedNgramLM.cpp:63
#6 0x000000000046cace in main (argc=8, argv=0x7fff95525928) at
src/interpolate-ngram.cpp:194
I attached my 2 text files and the dev.txt file. LMs were produced by:
estimate-ngram -read-text tmp1.txt --write-binary-lm tmp1.mitlm
estimate-ngram -read-text tmp2.txt --write-binary-lm tmp2.mitlm
Original comment by alu...@gmail.com
on 8 Dec 2008 at 4:30
Attachments:
Original comment by bojune...@gmail.com
on 8 Dec 2008 at 5:47
This issue only affect binary LM files. As the binary version number has been
changed, all binary files need to be rebuilt.
- Modified binary representation of Vocab to explicitly store length.
- Reading NgramVector from binary file did not update words() and hists() views.
- Incremented binary file version number.
Original comment by bojune...@gmail.com
on 8 Dec 2008 at 9:45
Original issue reported on code.google.com by
alu...@gmail.com
on 3 Dec 2008 at 2:20