collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
3.54k stars 185 forks source link

Outdated RQBottleneckTransformer model #118

Closed Subuday closed 3 months ago

Subuday commented 3 months ago

It looks like architecture of RQBottleneckTransformer has been changed, but model has not been retrained/reuploaded. So trying to load RQBottleneckTransformer using vq_model = vq_stoks.RQBottleneckTransformer.load_model(ref="collabora/whisperspeech:whisper-vq-stoks-medium-en+pl.model").cuda() leads to error:

Error(s) in loading state_dict for RQBottleneckTransformer:
    Missing key(s) in state_dict: "rq.project_in.weight", "rq.project_in.bias", "rq.project_out.weight", "rq.project_out.bias". 
    Unexpected key(s) in state_dict: "rq.layers.0.project_in.weight", "rq.layers.0.project_in.bias", "rq.layers.0.project_out.weight", "rq.layers.0.project_out.bias".
Subuday commented 3 months ago

Okay, I was using incorrect version of vector_quantize_pytorch. The correct fine is specified in settings.ini file.

jpc commented 3 months ago

Yeah, that's unfortunate. It would probably make sense to update the checkpoint and use the newest version of vector_quantize_pytorch since AFAIR the math did not change at all, just the layer names.

jpc commented 3 months ago

Maybe we could do it when we start working on new languages @zoq ?