Hi, @felixkreuk , first thank you for open-sourced such good repo on unsupervised phoneme segmentation. Recently, I conduct several experiments on SpeechOcean 762 dataset, which is a standard speech scoring dataset.
First, I directly apply the provided pretrained boundary detection model on this corpus, and only found about 50% F1 and R value.
I suspect this may relates to the domain mismatch problem, so I try to re-train this boundary detection model on the SpeechOcean corpus from scratch, but still attains about 50% F1 and R value, it is far lag from the referenced force aligned boundary result.
The following result screenshot is about using the random initialized model (without start training) to predict:
The following result screenshot is about using the trained model to predict:
Hi, @felixkreuk , first thank you for open-sourced such good repo on unsupervised phoneme segmentation. Recently, I conduct several experiments on SpeechOcean 762 dataset, which is a standard speech scoring dataset.
The following result screenshot is about using the random initialized model (without start training) to predict:
The following result screenshot is about using the trained model to predict:
Any idea on fixing this issue? Thanks in advance!