auspicious3000 / contentvec

speech self-supervised representations
MIT License
460 stars 36 forks source link

Uploaded 500-classes pretrained models are maybe incorrect (no trained) #10

Closed TylorShine closed 1 year ago

TylorShine commented 1 year ago

Hello, thanks for your great work.

As above in the title, The 500-classes pre-trained models were probably not trained.

Detail

I have tried to extract features with using pre-trained 500-classes legacy model (checkpoint_best_legacy_500.pt).
But I compared this extracted feature to the HuBERT base model one with output graph, seems to be same.

Here is a example input as same sound:

HuBERT (fairseq officially pre-trained hubert_base.pt) embeddings-1682615961-hubert-0-hubert_base

ContentVec (checkpoint_best_legacy_500.pt)
embeddings-1682615935-hubert-0_contentvec (title is "hubert", but this is truly "checkpoint_best_legacy_500.pt"'s output)

And, I wrote a script for compare model's state_dict (gist). In this result, hubert_base.pt and checkpoint_best_legacy_500.pt are the same.
This is same that checkpoint_best_500.pt, additional key is maybe correct for (pre-)training the ContentVec but other hubert_base.pt related keys' values are same.

Just to be sure, I also checked the 100-classes pre-trained models named checkpoint_best_{legacy_|}100.pt, which was seems to trained correctly.

If it exists, could you please upload the correct one?

Regards.

auspicious3000 commented 1 year ago

I used your code to compare the hubert_base and checkpoint_best_legacy_500, and your code printed "Differ Some Values!" all over the screen.

TylorShine commented 1 year ago

Thanks fast replying.

I'm sorry, You are right. conclusions:

Again, thank you for your wonderful work. I'll take a break.

auspicious3000 commented 1 year ago

So far, this is the only official repo for contentvec. We have not uploaded our models to other places.

TylorShine commented 1 year ago

OK, I understand. Thanks for given your time to me!

Have a nice day!