danpovey / pocolm

Small language toolkit for creation, interpolation and pruning of ARPA language models
Other
90 stars 48 forks source link

Error when converting to arpa #86

Closed francisr closed 7 years ago

francisr commented 7 years ago

I have trained a 4gram LM, and when I try to convert it to arpa I get this error: pre-arpa-to-arpa: read confusing sequence of lines: ' 3 4833 353 2127 -0.00512728' followed by: ' 3 4833 353 2144 -0.00512728'... bad counts?

What could be the cause of this?

francisr commented 7 years ago

As I understand there is a line with the backoff weight, but no line with the weight.
This 3gram appears only once in my text, and I have a min_counts of 2, so none of these two lines should be there.
I've been playing with the code that generates the counts though, so it's my fault, but I'd like to know where I should look at to get the correct behaviour.

danpovey commented 7 years ago

The issue is that you have a trigram '4833 353 2127' that has a backoff prob, but that trigram itself does not exist, i.e. there was no count for '353 4833 -> 2127'. Such models can't be written in arpa format. Anyway, bottom line, there is some assumption about your counts that is not being satisfied. Hard to know without knowing how you changed the counts generation.

On Tue, Nov 22, 2016 at 12:13 PM, Rémi Francis notifications@github.com wrote:

As I understand there is a line with the backoff weight, but no line with the weight. This 3gram appears only once in my text, and I have a min_counts of 2, so none of these two lines should be there. I've been playing with the code that generates the counts though, so it's my fault, but I'd like to know where I should look at to get the correct behaviour.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/86#issuecomment-262302983, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu024qT7iCkjQGepHCApVspLfX8DOks5rAyLAgaJpZM4K5djP .

francisr commented 7 years ago

Thanks, this was indeed the cause of the problem, I managed to fix it.