bpotard / idlak

This repository is now obsolete. Please go to https://github.com/idlak/idlak instead.
https://github.com/idlak/idlak
Other
39 stars 15 forks source link

problems for training dm-dnn #15

Open haihua opened 6 years ago

haihua commented 6 years ago

Hi, I am using this toolkit to reproduce the results. However, I found the dm-dnn training is completely failed. The dnn output label is obviously inconsistent with kaldi convention, could you please give a brief explanation: copy-feats scp:./durdata/train/feats.scp ark,t:-
slt_arctic_a0001 [ 1 43 11 43 12 43 11 43 8 43 13 24 1 24 1 24 8 24 1 24 5 29 1 29 3 29 6 29 14 29 6 33 1 33 1 33 1 33 24 33 1 9 3 9 1 9 1 9 3 9 1 13

Thanks a lot for your help !

bpotard commented 6 years ago

Hello,

Sorry for the late reply.

Unlike most tools in kaldi - which assumes the output of a DNN are posteriors - the output in the DNN used by idlak are always features matrices. You probably need to customise the binaries if you plan to use a different training tool than the one provided.

You are showing here the output features for training the duration model - which trains a mapping between "labels", i.e. features representing input phone identities, to the relevant durations, that will then be used to generate frame-level input for the acoustic model. On each line, you get 2 features: the first one is the duration of the current HMM state, while the second one is the duration of the phone. Assuming each phone is spread over 5 states, you will get a sequence of 5 features with the same phone duration, and the sum of all 5 HMM states should match the phone duration, i.e. 1 + 11 + 12 +11 + 8 = 43 if we consider the first phone here.

Not sure what you mean by dm-dnn training is completely failed :-)

Regards, Blaise