auspicious3000 / contentvec

speech self-supervised representations
MIT License
434 stars 32 forks source link

What are the pseudo labels? #23

Closed Lukysoon closed 2 months ago

Lukysoon commented 3 months ago

Hi, I thought that ContentVec (as well as HuBert) use k-means algorithm for creating labels. So for what reason we need {train,valid}.km and what exactly they are?

Thank you :-)

auspicious3000 commented 3 months ago

{train,valid}.km are the labels clustered by k-means