danpovey / pocolm

Small language toolkit for creation, interpolation and pruning of ARPA language models
Other
90 stars 48 forks source link

A script to compute sentence probs #92

Open xiaohui-zhang opened 6 years ago

xiaohui-zhang commented 6 years ago

We need a script like rnnlm/compute_sentence_scores.sh in kaldi-rnnlm to compute scores on sentence level in a text file. A start point would be pocolm/scripts/get_data_prob.py which computes the prob of a whole text file. Dongji has offered to do this. Thanks.

DongjiGao commented 6 years ago

My ideas are 1) split the text file into sentences and call get_data_prob.py for each sentence 2) add an new argument (--sentence-prob) to get_data_prob.py and compute sentence probability inside it. Which one do you prefer? @danpovey And do we need to support utterance id?

danpovey commented 6 years ago

Add a new option, or create a new python script. Calling get_data_prob.py for each sentence would be very slow as it would have to load the model each time.

Supporting utterance-ids would be nice, but it's not necessary as we could use paste to add them back in afterward.

On Wed, Jun 27, 2018 at 11:03 PM, DongjiGao notifications@github.com wrote:

My ideas are 1) split the text file into sentences and call get_data_prob.py for each sentence 2) add an new argument (--sentence-prob) to get_data_prob.py and compute sentence probability inside it. Which one do you prefer? @danpovey https://github.com/danpovey And do we need to support utterance id?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/92#issuecomment-400896047, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu6sP8-SToWJrnrbyk3pUHDtrCZ3Aks5uBEeRgaJpZM4U6B0O .

danpovey commented 6 years ago

It's OK. It will be a good exercise for you since you are doing a lot of language modeling work and SRILM is a very standard tool.

On Thu, Jul 5, 2018 at 8:59 PM, DongjiGao notifications@github.com wrote:

Will do. It might take me some time since I have not used SRILM before.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/92#issuecomment-402894024, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu4RidO209HiY9P-KqBK6vQlyrsxFks5uDrZxgaJpZM4U6B0O .

srgangireddy commented 4 years ago

Hi, I wonder is this is implemented in get_data_prob.py? Thank you.

danpovey commented 4 years ago

I don't think it is; as I said above in the thread, that will give you the overall prob but not per line.

On Mon, Sep 7, 2020 at 8:08 PM Siva Reddy Gangireddy < notifications@github.com> wrote:

Hi, I wonder is this is implemented in get_data_prob.py? Thank you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/92#issuecomment-688283036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5P3X6NJGZSMHKZ3BLSETEKBANCNFSM4FHIDUHA .

srgangireddy commented 4 years ago

ok. thanks for letting me know.