@HawkAaron
Hi, I have a question about your code on feature transform part.
According to Alex Graves 2013 paper, the feature applied is described as
The audio data was encoded using a Fourier-transform-based filter-bank with 40 coefficients (plus energy) distributed on a mel-scale, together with their first and second temporal derivatives. Each input vector was therefore size 123. The data were normalised so that every element of the input vectors had zero mean and unit variance over the training set.
In your code DataLoader.py the feature transform part is :
Correct me if I make an error here, I think the feature transform is already accomplished before the nnet-forward command.
So why did you use a nnet to make the feature embedding?
When I look into the feature_transform.sh, I got more confused that the net-forward part seems to be another feature normalization all over again, can you explain a little bit for this part? Thx
@HawkAaron Hi, I have a question about your code on feature transform part. According to Alex Graves 2013 paper, the feature applied is described as
In your code DataLoader.py the feature transform part is :
Correct me if I make an error here, I think the feature transform is already accomplished before the nnet-forward command. So why did you use a nnet to make the feature embedding?
When I look into the feature_transform.sh, I got more confused that the net-forward part seems to be another feature normalization all over again, can you explain a little bit for this part? Thx