PlayVoice / whisper-vits-svc

Core Engine of Singing Voice Conversion & Singing Voice Clone
https://huggingface.co/spaces/maxmax20160403/sovits5.0
MIT License
2.57k stars 914 forks source link

Something wrong with the Decoder. #137

Closed bukhalmae145 closed 3 months ago

bukhalmae145 commented 9 months ago

I adjusted Hubert Korean to Grad-SVC and trained the model. But the exported audio file sounds weird and the generated decoder image looks weird. 277835335-e3f61927-0fdf-483d-a643-c038d0b84300

yeorinhieut commented 9 months ago

At the very least, you need to provide what you fixed and how you fixed it so we can figure out what the problem is. In most cases, it's a user-created problem.

MaxMax2016 commented 9 months ago

@bukhalmae145 I think your diffusion is not trained.

full_epochs: 500
fast_epochs: 100

diffusion just starts training after fast_epochs are finished.

MaxMax2016 commented 9 months ago

@bukhalmae145 I recommend that you find a partner who knows deep learning to help you, and he is at your side.

bukhalmae145 commented 8 months ago

At the very least, you need to provide what you fixed and how you fixed it so we can figure out what the problem is. In most cases, it's a user-created problem.

https://github.com/PlayVoice/so-vits-svc-5.0/issues/130

위 이슈 내용이랑 똑같이 수정해서 훈련은 되는데 결과가 요상하네요.. 생성된 wav 파일에서 핑크노이즈 같은 것만 들리네요. 저는 이쪽 분야에 대해서 문외한이고 github문화도 잘 모릅니다.. 혹시 더 이상 여쭤보는게 실례라면 그만하겠습니다..

bukhalmae145 commented 8 months ago

@bukhalmae145 I think your diffusion is not trained.

full_epochs: 500
fast_epochs: 100

diffusion just starts training after fast_epochs are finished.

I just found out that leaving blank in the pretrain: line from base.yaml causes this issue. But I remember that you told me that I should leave it blank from the previous discussion. Is there any way to build pretrained model based on Korean Hubert? https://github.com/PlayVoice/so-vits-svc-5.0/issues/130

bukhalmae145 commented 8 months ago

@bukhalmae145 I think your diffusion is not trained.

full_epochs: 500
fast_epochs: 100

diffusion just starts training after fast_epochs are finished.

I found out that it shows /AppleInternal/Library/BuildRoots/495c257e-668e-11ee-93ce-926038f30c31/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:4352: failed assertiondestination kernel width and filter kernel width mismatch' [1] 4044 abort python gvc_trainer.py

~/Music/Grad-SVC-20230930 ....... ABRT | took 7s | Grad-SVC py | at 22:57:12

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3` error when it starts to diff train on mac no matter what Hubert Model I use.

MaxMax2016 commented 8 months ago

It's hard for me to identify your problem,sorry.

bukhalmae145 commented 8 months ago

It's hard for me to identify your problem,sorry.

I appreciate your kindness!! But is there any way to making pretrained model based on Korean Hubert Model to replace the original gvc.pretrained.pth?

bukhalmae145 commented 8 months ago

It's hard for me to identify your problem,sorry.

I finally found out that line 38 of ./grad_extend/train.py hps.grad.dec_dim, hps.grad.beta_min, hps.grad.beta_max, hps.grad.pe_scale).to('mps') causes this issue. It works fine with the cpu but I think MPS device is not compatible with the code.

bukhalmae145 commented 8 months ago

It's hard for me to identify your problem,sorry.

Other losses (prior_loss, mel_loss, etc..) are working fine with loss.backward() but only diff_loss makes error with using loss.backward(). (from ./grad_extend/train.py)