chrisjbryant / lmgec-lite

A language model-based approach to Grammatical Error Correction for English that uses minimal annotated data.
49 stars 18 forks source link

KenLM setup seems to be broken #1

Closed nmatthews-asapp closed 6 years ago

nmatthews-asapp commented 6 years ago

When running on the 1b.txt file I get the following error

=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:29104080 2:684897168 3:3846225240 4:9279470400 5:14419969256
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
----------------------------------------------------------------------------------------------------Last input should have been poison.
[1]    15749 abort (core dumped)  ~/kenlm/build/bin/lmplz -o 5 -S 95% -T tmp/ < 1b.txt > 1b.arpa

Specifically this "last input should have been poison message" seems to be the problem. Not sure if it's caused by 1b.txt or another problem yet, but I haven't found any troubleshooting info on KenLM's site related to this issue.

A related issue: https://github.com/kpu/kenlm/issues/177 although it happens on step 4, with a slightly different command

kpu commented 6 years ago

I think you ran out of disk space. The problem is exception unwinding is causing a destructor check to fire so it hides the real exception. I've changed the code to not abort so you can see the real error message.

nmatthews-asapp commented 6 years ago

thanks I'll reinstall from master and try again.

this is surprising though, as my machine has about 114 GiB memory free at time of running. in step 2 the estimated memory footprint was < 100 GB and this repo suggests 20-40 GB memory footprint as expected

kpu commented 6 years ago

disk != memory

nmatthews-asapp commented 6 years ago

Whoops I misread your message. Ok, that could be it. I might not have installed it on the right disk (right = the bigger one)

nmatthews-asapp commented 6 years ago

@kpu confirmed: it was lack of disk space. thanks for fixing error exception reporting.

got it working!