Open LyWangPX opened 1 year ago
Hi, I think you are very correct on this (i.e., The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762). Otherwise the model won't do anything correct.
I notice the score is mainly determined not by the .wav but by the phn.
However, even if the bug is fixed, the input phn would have an relative large impact on the prediction. This is because 1) different phn have different error prior; and 2) if the phone is pronounced correctly depends on the canonical phone, e.g., for a phone pronounced as /e/, it will be correct if the canonical phone is /e/, but wrong if the canonical phone is /a:/. We did an ablation study in the paper.
-Yuan
Hi @YuanGongND did you update the tutorial?
@amandeepbaberwal
No, I don't plan to do so as 1) it is not promised in the paper, we already released whatever we have; and 2) it is more related to Kaldi rather than GOPT.
Please understand that we are not a company so cannot provide full support for the project.
-Yuan
Hi @LyWangPX could you please explain how did you solve this problem?? I am running into the same problem my score is not changing even i change the content in to .wav file completely.
In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn. There was an extreme pattern for multiple sound files of the same word:
Word A 4 4 4 4 4 4 Word B 5 5 5 5 5 5
Even after messing up the .wav files, the results remain the same. Then I found a potential reason:
In
gen_seq_data_phn.py
,tr_label_phn
orte_label_phn
is generated by thephn_dict
that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762. I will update the inference tutorial if you think it is necessary.