danpovey / pocolm

Small language toolkit for creation, interpolation and pruning of ARPA language models
Other
90 stars 48 forks source link

swbd and/or swbd_fisher example script #1

Open danpovey opened 8 years ago

danpovey commented 8 years ago

We need some example scripts on real data to help compare perplexities against baselines like SRILM and kaldi_lm. [part of this is to work on those baselines.] A Switchboard-only setup, with the first 10k utts (or is it the last?) used as dev data, would be nice- compare with the LM-estimating scripts in Kaldi's switchboard setup. Also (since the main point of this toolkit is for better interpolation), we need an example setup with multiple datasets to be combined, e.g. the Switchboard+Fisher setup that we currently use (optionally) in the Switchboard example scripts in Kaldi.

@vijayaditya, you could help with this if you have tim- you said you were interested in LM stuff.

vijayaditya commented 8 years ago

Ok on it.

Vijay On May 9, 2016 07:07, "Daniel Povey" notifications@github.com wrote:

We need some example scripts on real data to help compare perplexities against baselines like SRILM and kaldi_lm. [part of this is to work on those baselines.] A Switchboard-only setup, with the first 10k utts (or is it the last?) used as dev data, would be nice- compare with the LM-estimating scripts in Kaldi's switchboard setup. Also (since the main point of this toolkit is for better interpolation), we need an example setup with multiple datasets to be combined, e.g. the Switchboard+Fisher setup that we currently use (optionally) in the Switchboard example scripts in Kaldi.

@vijayaditya https://github.com/vijayaditya, you could help with this if you have tim- you said you were interested in LM stuff.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/danpovey/pocolm/issues/1