issues
search
auspicious3000
/
autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
976
stars
207
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How to use this for repo for just testing?
#67
sandeshnaroju
opened
3 years ago
11
UserWarning: Using a target size (torch.Size([4, 1, 128, 80])) that is different to the input size (torch.Size([4, 128, 80])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
#66
BenjaminChua
closed
3 years ago
1
How to generate the pretrained model/checkpoint files?
#65
KevinHua
closed
3 years ago
3
autovc for singing voice transfer
#64
lstrgar
closed
3 years ago
1
Model after train location
#63
swel4ik
closed
3 years ago
1
How content loss is implemented?
#62
ahrzb
closed
3 years ago
2
question about training loss and inference performance
#61
zzw922cn
opened
3 years ago
6
Speaker embedding training
#60
ghost
closed
3 years ago
1
About the trained vocoder
#59
Baliii
closed
3 years ago
1
Very slow inference
#58
samialsindi
closed
3 years ago
1
unable to apply voice conversion to long files using my trained speaker embedding
#57
xanguera
opened
3 years ago
4
What are the differences between the model hosted in this repo and the true model?
#56
ngulya
closed
3 years ago
4
confusion about model update
#55
jayzhu02
opened
3 years ago
8
reconstruction loss won't decrease
#54
billy800413
opened
3 years ago
1
Making zero-shot model
#53
sbkim052
opened
3 years ago
1
If speaker embedding is not added to the encoder input, will it affect the model effect?
#52
qq547276542
opened
3 years ago
1
Refactor model
#51
narumiruna
closed
3 years ago
0
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder
#50
tebin
opened
4 years ago
9
Issues with conversion of VCTK speakers using pre-trained model
#49
gustxsr
opened
4 years ago
9
Whether the speech on each batch will be crop to a fixed length of time during training?
#48
qq547276542
opened
4 years ago
4
Why the batch size is set to 2?
#47
qq547276542
closed
4 years ago
1
does it work? didn't even try testing
#46
marshonhuckleberry
closed
4 years ago
0
if the speaker embedding is one hot code, what's the difference between your work and previous vae-based voice conversion?
#45
Georgehappy1
closed
4 years ago
1
Follow-up work available for viewing
#44
auspicious3000
opened
4 years ago
7
The complete training code may be sent through email upon special request for non-commercial purposes.
#43
auspicious3000
closed
3 years ago
1
Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count
#42
j2l
closed
4 years ago
0
kernel died
#41
j2l
closed
4 years ago
1
Adapt to tensorflow 2.x
#40
Barbany
closed
4 years ago
0
Tools to split VCTK audio
#39
YichongLeng
closed
4 years ago
11
The complete training code may be sent through email upon special request for non-commercial purposes.
#38
auspicious3000
closed
4 years ago
11
why the quality of the demo page has difference with your new paper?
#36
azraelkuan
closed
4 years ago
3
Question concerning loss function in the paper
#35
xSeanliux
closed
4 years ago
0
How you generate speaker embedding?
#34
hsiehjackson
opened
4 years ago
3
Bad conversion quality after retraining
#33
kvnsq
closed
4 years ago
21
Dataset Size for Training
#32
jacob-mink
closed
4 years ago
2
Demo's dont work
#31
Nintorac
closed
4 years ago
0
Why need original speaker embeddings concatenated with original speaker spectrogram?
#30
nkcdy
opened
4 years ago
10
confusion with speaker encoder and loss func
#29
andylida
opened
4 years ago
10
How to use this project on another dataset?
#28
smalissa
opened
4 years ago
4
2000 epoches needed to train?
#27
wotulong
closed
4 years ago
2
AutoVC on a large scale data?
#26
iyah4888
opened
4 years ago
2
Hyperparameters for generating mel spectrogram from training .wav files
#25
sroutray
opened
4 years ago
3
For those of you who need pre-trained speaker embedding models, Here it is.
#24
auspicious3000
closed
4 years ago
0
Tranning is too slow
#23
1015720437
closed
4 years ago
3
Downsampling process is different from that described in the paper
#22
light1726
opened
4 years ago
3
Is voice activity detection necessary for wav preprocessing?
#21
WeiLi233
opened
4 years ago
1
Does anyone reproduce the sound quality in the demo page?
#20
WeiLi233
opened
4 years ago
15
re-training result, it is not good enough, can you share some advice about Hyperparameter?
#19
ZHANG-SHI-CHANG
opened
4 years ago
3
Re-training steps?
#18
Husnain08
opened
4 years ago
1
What is the format of the metadata?
#17
1015720437
opened
4 years ago
21
Previous
Next