auspicious3000 / contentvec

speech self-supervised representations
MIT License
434 stars 32 forks source link

How to get a new spk2info.dict? #14

Open zyfily opened 11 months ago

zyfily commented 11 months ago

I want to train a new model with other dataset,but I don't find the way to get a new spk2info.dict.

auspicious3000 commented 11 months ago

It's very simple. The lo and hi are fixed values for male and female voices respectively. The speaker embeddings can be extracted using the well-known Resemblyzer.

zyfily commented 11 months ago

So,we just need the dict to use speaker embeddings ,lo and hi? The last value after these is useless?

freds0 commented 10 months ago

I developed this script that uses parselmouth or pyreaper. It needs some adjustments, because I'm creating the embeddings for all the files, and in the original I believe that an average per speaker is created.

create_contentvec_dict.zip

SandroChen commented 10 months ago

@auspicious3000 Thank you for sharing the tool Resemblyzer. Here is another question...if you don't mind: is there any convenient method to get frame-aligned pseudo label files on a new dataset ? I know that montreal forced aligner can do this, but the format of its output is quite different from the .km file...