Zain-Jiang / Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.
174 stars 19 forks source link

VCTK checkpoint #14

Open danablend opened 9 months ago

danablend commented 9 months ago

Thanks for the great work on this repository, really useful!

Wondering if there is a VCTK checkpoint that could be accessed, for use with speakers with UK accent?

Again thanks for this repository!

Zain-Jiang commented 8 months ago

The VCTK checkpoint is provided in the following link https://drive.google.com/drive/folders/1L8k18QdtN6ew_i-6FjoJyfQCCdvIbSUv?usp=sharing.

If you encounter any problems during use, please feel free to contact us.

danablend commented 8 months ago

Thank you very much!

danablend commented 8 months ago

Hey, just opening this up again.

Would you be able to provide the MFA results from your VCTK run if you have those?

RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for fs.encoder.embed_tokens.weight: copying a param with shape torch.Size([76, 192]) from checkpoint, the shape in current model is torch.Size([80, 192]).

I'm guessing that the number of phonemes changes between LibriTTS and VCTK, so the shape of one of the encoder's layers is mismatched between the checkpoint and the instantiated model when attempting to use the LibriTTS MFA results you provided in one of your previous responses :-)

If you don't have these anymore, no problem, I can always download VCTK and generate them locally with MFA.

Thanks! :-)

Linghuxc commented 7 months ago

@danablend Have you solved this problem? This problem also arises when I continue to train with LibriTTS on my trained vctk checkpoint, it seems that the number of "num_embedding" does not match; Cause "fs.encoder.embed_tokens.weight" changes, I do not know how to solve. Thank you!

@Zain-Jiang Hi, Have you encountered this problem before? RuntimeError: Error(s) in loading state_dict for GaussianDiffusion: size mismatch for fs.encoder.embed_tokens.weight: copying a param with shape torch.Size([76, 192]) from checkpoint, the shape in current model is torch.Size([80, 192]).

I wonder if you can help us, thank you!

Linghuxc commented 7 months ago

I suddenly realized if we have to train all the data together (Libritts+vctk), by which I mean together through the MFA, instead of training one separately and then training the next? @Zain-Jiang

And I can share my VCTK MFA results for you, whether you need it now? @danablend

danablend commented 7 months ago

That is a great insight @Linghuxc . Would you be able to share your MFA files?

Linghuxc commented 7 months ago

@danablend
I'm very sorry for replying you so late. This week I tried to upload the results of my data/processedand data/binaryto the cloud, but I found that it was too big and my space was not that big, and I tried several platforms but failed to upload.

I'm really sorry to keep you waiting so long, I was trying to upload even today