Alexander-H-Liu / dinosr

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
47 stars 4 forks source link

Evaluation script #3

Open cromz22 opened 2 months ago

cromz22 commented 2 months ago

Thank you for opensourcing this amazing work!

Do you have any plans for releasing the evaluation scripts?

I would like to reproduce the results provided in the tables in the paper, but it seems that enough details are not provided. For example,

cantabile-kwok commented 2 weeks ago

@cromz22 Hi Shuichiro, I am also using this repo and facing the same problem. I am wondering that have you managed to work out a way for obtaining discrete units from pretrained DinoSR model after posting this issue? I will be very grateful for any help : )

cromz22 commented 2 weeks ago

Hi, I have no progress on this since this report. As stated above, I believe the argmax values above are the discrete units, but I can't be sure.

cantabile-kwok commented 2 weeks ago

I read through the code carefully, and I believe you are right. The discrete units should be the argmax values of negative distances between layer outputs and codebooks.