auspicious3000 autovc issues

auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

https://arxiv.org/abs/1905.05879

MIT License

976 stars 207 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to use this for repo for just testing?

#67 sandeshnaroju opened 3 years ago
11
UserWarning: Using a target size (torch.Size([4, 1, 128, 80])) that is different to the input size (torch.Size([4, 128, 80])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

#66 BenjaminChua closed 3 years ago
1
How to generate the pretrained model/checkpoint files?

#65 KevinHua closed 3 years ago
3
autovc for singing voice transfer

#64 lstrgar closed 3 years ago
1
Model after train location

#63 swel4ik closed 3 years ago
1
How content loss is implemented?

#62 ahrzb closed 3 years ago
2
question about training loss and inference performance

#61 zzw922cn opened 3 years ago
6
Speaker embedding training

#60 ghost closed 3 years ago
1
About the trained vocoder

#59 Baliii closed 3 years ago
1
Very slow inference

#58 samialsindi closed 3 years ago
1
unable to apply voice conversion to long files using my trained speaker embedding

#57 xanguera opened 3 years ago
4
What are the differences between the model hosted in this repo and the true model?

#56 ngulya closed 3 years ago
4
confusion about model update

#55 jayzhu02 opened 3 years ago
8
reconstruction loss won't decrease

#54 billy800413 opened 3 years ago
1
Making zero-shot model

#53 sbkim052 opened 3 years ago
1
If speaker embedding is not added to the encoder input, will it affect the model effect?

#52 qq547276542 opened 3 years ago
1
Refactor model

#51 narumiruna closed 3 years ago
0
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

#50 tebin opened 4 years ago
9
Issues with conversion of VCTK speakers using pre-trained model

#49 gustxsr opened 4 years ago
9
Whether the speech on each batch will be crop to a fixed length of time during training?

#48 qq547276542 opened 4 years ago
4
Why the batch size is set to 2?

#47 qq547276542 closed 4 years ago
1
does it work? didn't even try testing

#46 marshonhuckleberry closed 4 years ago
0
if the speaker embedding is one hot code, what's the difference between your work and previous vae-based voice conversion?

#45 Georgehappy1 closed 4 years ago
1
Follow-up work available for viewing

#44 auspicious3000 opened 4 years ago
7
The complete training code may be sent through email upon special request for non-commercial purposes.

#43 auspicious3000 closed 3 years ago
1
Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count

#42 j2l closed 4 years ago
0
kernel died

#41 j2l closed 4 years ago
1
Adapt to tensorflow 2.x

#40 Barbany closed 4 years ago
0
Tools to split VCTK audio

#39 YichongLeng closed 4 years ago
11
The complete training code may be sent through email upon special request for non-commercial purposes.

#38 auspicious3000 closed 4 years ago
11
why the quality of the demo page has difference with your new paper?

#36 azraelkuan closed 4 years ago
3
Question concerning loss function in the paper

#35 xSeanliux closed 4 years ago
0
How you generate speaker embedding?

#34 hsiehjackson opened 4 years ago
3
Bad conversion quality after retraining

#33 kvnsq closed 4 years ago
21
Dataset Size for Training

#32 jacob-mink closed 4 years ago
2
Demo's dont work

#31 Nintorac closed 4 years ago
0
Why need original speaker embeddings concatenated with original speaker spectrogram?

#30 nkcdy opened 4 years ago
10
confusion with speaker encoder and loss func

#29 andylida opened 4 years ago
10
How to use this project on another dataset?

#28 smalissa opened 4 years ago
4
2000 epoches needed to train?

#27 wotulong closed 4 years ago
2
AutoVC on a large scale data?

#26 iyah4888 opened 4 years ago
2
Hyperparameters for generating mel spectrogram from training .wav files

#25 sroutray opened 4 years ago
3
For those of you who need pre-trained speaker embedding models, Here it is.

#24 auspicious3000 closed 4 years ago
0
Tranning is too slow

#23 1015720437 closed 4 years ago
3
Downsampling process is different from that described in the paper

#22 light1726 opened 4 years ago
3
Is voice activity detection necessary for wav preprocessing？

#21 WeiLi233 opened 4 years ago
1
Does anyone reproduce the sound quality in the demo page?

#20 WeiLi233 opened 4 years ago
15
re-training result, it is not good enough, can you share some advice about Hyperparameter？

#19 ZHANG-SHI-CHANG opened 4 years ago
3
Re-training steps?

#18 Husnain08 opened 4 years ago
1
What is the format of the metadata?

#17 1015720437 opened 4 years ago
21

Previous Next