Wrong parsing of .counts files

danpovey / pocolm

Small language toolkit for creation, interpolation and pruning of ARPA language models

Other

90 stars 48 forks source link

Closed micr0cuts closed 5 years ago

micr0cuts commented 5 years ago

for line in f:
    line = line.split()
    word_to_count[line[1]] += int(line[0])

With the current code the only "words" that are matched across dev and train sets are the counts of the unigrams but in string format!

danpovey commented 5 years ago

Do you know how to make a pull request? Might be a python3 issue.

danpovey commented 5 years ago

Oh I see, it's not a python3 issue. This should only affect metaparameter initialization, but it's still a bug. I'll fix it.

danpovey commented 5 years ago

I resolved this via push (should have done it via PR, but anyway)... so it's resolved.