danpovey / pocolm

Small language toolkit for creation, interpolation and pruning of ARPA language models
Other
90 stars 48 forks source link

add --limit-unk-history for get-text-counts #64

Closed wantee closed 8 years ago

wantee commented 8 years ago

Addressing issue #62. Added an option to get-text-counts and the related scripts, example of usage is in egs/swbd/run.sh.

Test outputs of get-text-counts are:

$ (echo 11 12 13 14; echo 11 3 3 13; echo 11 12 3 13) | ./get-text-counts --limit-unk-history 4
      1      11
     11       1      12
     12      11       1      13
     13      12      11      14
     14      13      12       2
      1      11
     11       1       3
      3       3
      3      13
     13       3       2
      1      11
     11       1      12
     12      11       1       3
      3      13
     13       3       2
get-text-counts: processed 3 lines, with (on average) 6 words per line.
danpovey commented 8 years ago

Thanks for doing this so fast!