Open danpovey opened 8 years ago
Sure, I'll have a try. Ke
Hi Dan, During doing this task, I found two small errors in train_lm.py in the step of generating the vocab using wordlist (as the ted_train_lm.sh used a wordlist to generate the vocab): 1) index error 2) input to wordlist_to_vocab.py is wrong
I fixed the errors as below and tested it.
diff --git a/scripts/train_lm.py b/scripts/train_lm.py
index bbcfe05..e0fcea5 100755
--- a/scripts/train_lm.py
+++ b/scripts/train_lm.py
@@ -234,7 +234,7 @@ else:
LogMessage("Skip generating vocab")
else:
LogMessage("Generating vocab with
wordlist[{0}]...".format(args.wordlist))
command = "wordlist_to_vocab.py {1} > {2}".format(word_counts_dir,
vocab)
command = "wordlist_to_vocab.py {0} > {1}".format(args.wordlist,
vocab)
log_file = os.path.join(log_dir, 'wordlist_to_vocab.log')
RunCommand(command, log_file, args.verbose == 'true')
TouchFile(done_file)
Do I need to do a PR for this?
Ke
On Sat, Aug 13, 2016 at 5:05 PM, Daniel Povey notifications@github.com wrote:
In Kaldi, in egs/tedlium/s5_r2/local/train_ted_lm.sh [or something like that], we have a script that trains the pocolm LM. Because this was set up before we had train_lm.py, it doesn't use train_lm.py. I'd like it to be modified to use train_lm.py, and to have the option that bypasses metaparameter optimization set by default, to make the default build fast [but make it easy to comment out].
Ke, perhaps you could do this. This is really an issue in Kaldi, but mentioning it here.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/ANVxSu-ws5zex3T_CwiAGhO6hrB74Ubpks5qfjGqgaJpZM4Jjwpk .
Ke Li Dept. of Electrical and Computer Engineering Johns Hopkins University Email: kli26@jhu.edu
yes do a PR
On Sun, Aug 14, 2016 at 4:17 PM, Ke Li notifications@github.com wrote:
Hi Dan, During doing this task, I found two small errors in train_lm.py in the step of generating the vocab using wordlist (as the ted_train_lm.sh used a wordlist to generate the vocab): 1) index error 2) input to wordlist_to_vocab.py is wrong
I fixed the errors as below and tested it.
diff --git a/scripts/train_lm.py b/scripts/train_lm.py
index bbcfe05..e0fcea5 100755
--- a/scripts/train_lm.py
+++ b/scripts/train_lm.py
@@ -234,7 +234,7 @@ else:
LogMessage("Skip generating vocab")
else:
LogMessage("Generating vocab with wordlist[{0}]...".format(args.wordlist))
- command = "wordlist_to_vocab.py {1} > {2}".format(word_counts_dir, vocab)
- command = "wordlist_to_vocab.py {0} > {1}".format(args.wordlist, vocab)
log_file = os.path.join(log_dir, 'wordlist_to_vocab.log')
RunCommand(command, log_file, args.verbose == 'true')
TouchFile(done_file)
Do I need to do a PR for this?
Ke
On Sat, Aug 13, 2016 at 5:05 PM, Daniel Povey notifications@github.com wrote:
In Kaldi, in egs/tedlium/s5_r2/local/train_ted_lm.sh [or something like that], we have a script that trains the pocolm LM. Because this was set up before we had train_lm.py, it doesn't use train_lm.py. I'd like it to be modified to use train_lm.py, and to have the option that bypasses metaparameter optimization set by default, to make the default build fast [but make it easy to comment out].
Ke, perhaps you could do this. This is really an issue in Kaldi, but mentioning it here.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/ANVxSu-ws5zex3T_ CwiAGhO6hrB74Ubpks5qfjGqgaJpZM4Jjwpk .
Ke Li Dept. of Electrical and Computer Engineering Johns Hopkins University Email: kli26@jhu.edu
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/danpovey/pocolm/issues/47#issuecomment-239704232, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVuyDik_PaPS_gfwuTzeUR8dlgy6woks5qf6IdgaJpZM4Jjwpk .
In Kaldi, in egs/tedlium/s5_r2/local/train_ted_lm.sh [or something like that], we have a script that trains the pocolm LM. Because this was set up before we had train_lm.py, it doesn't use train_lm.py. I'd like it to be modified to use train_lm.py, and to have the option that bypasses metaparameter optimization set by default, to make the default build fast [but make it easy to comment out].
Ke, perhaps you could do this. This is really an issue in Kaldi, but mentioning it here.