Open Remaxic opened 8 months ago
I have a similar question while using pre-trained HuBERT, K-means, and unit-vocoder provided by you, producing good sound. But if I am training k-means clustering on Libirspeech data https://keithito.com/LJ-Speech-Dataset/, which have around 13k audio samples and synthesizing .wav using pre-trained unit-vocoder provided by you, not outputting good sound.
Questions that I want to ask are: On which data available, pre-trained k-means are trained? What are the hyperparameters, such as epochs, batch size, etc.? Are there any other important things that are not mentioned in the paper and required to train the k-means?
Thanks in advance
❓ Questions and Help
My question
Hello, due to my downstream task requirements, I need to perform k-means clustering on the output of Contentvec model, that has the same structure as the HuBERT model but with a different training idea. I have performed feature extraction on my dataset on Contentvec and learnt a clustering model using the code you provided. However I found the clustering to be far less effective than the clustering model you provided for HuBERT.
Do you do any special treatment of the features (such as dimensionality reduction) before training the clustering model? Or maybe my dataset is small in size (7430431* 768)? Or if you can make valuable suggestions for my clustering, I would appreciate it!
The code I have tried for clustering: