Open haihua opened 6 years ago
Hello,
Sorry for the late reply.
Unlike most tools in kaldi - which assumes the output of a DNN are posteriors - the output in the DNN used by idlak are always features matrices. You probably need to customise the binaries if you plan to use a different training tool than the one provided.
You are showing here the output features for training the duration model - which trains a mapping between "labels", i.e. features representing input phone identities, to the relevant durations, that will then be used to generate frame-level input for the acoustic model. On each line, you get 2 features: the first one is the duration of the current HMM state, while the second one is the duration of the phone. Assuming each phone is spread over 5 states, you will get a sequence of 5 features with the same phone duration, and the sum of all 5 HMM states should match the phone duration, i.e. 1 + 11 + 12 +11 + 8 = 43 if we consider the first phone here.
Not sure what you mean by dm-dnn training is completely failed :-)
Regards, Blaise
Hi, I am using this toolkit to reproduce the results. However, I found the dm-dnn training is completely failed. The dnn output label is obviously inconsistent with kaldi convention, could you please give a brief explanation: copy-feats scp:./durdata/train/feats.scp ark,t:-
slt_arctic_a0001 [ 1 43 11 43 12 43 11 43 8 43 13 24 1 24 1 24 8 24 1 24 5 29 1 29 3 29 6 29 14 29 6 33 1 33 1 33 1 33 24 33 1 9 3 9 1 9 1 9 3 9 1 13
Thanks a lot for your help !