effusiveperiscope / so-vits-svc

so-vits-svc
MIT License
179 stars 71 forks source link

Code execution hangs at final step - No audio is being output for local version #16

Closed cody151 closed 1 year ago

cody151 commented 1 year ago

Hi,

Windows 10 64bit VSCode latest

For some reason when I click convert button it just hangs with this message and nothing happens ( the audio file located inside directory already):

DEBUG:Comm:handle_msg[609c2fa108444e609fabd887585be7f1]({'header': {'date': datetime.datetime(2023, 3, 31, 6, 20, 11, 101000, tzinfo=tzutc()), 'msg_id': '85773ff6-72be-4fc8-a49e-d92d19a5ac5e', 'msg_type': 'comm_msg', 'session': '7943363c-b292-451a-9069-014f9ec0a314', 'username': '3018a1bf-807d-497f-a07a-3f5fdee228f9', 'version': '5.2'}, 'msg_id': '85773ff6-72be-4fc8-a49e-d92d19a5ac5e', 'msg_type': 'comm_msg', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': '609c2fa108444e609fabd887585be7f1', 'data': {'method': 'update', 'state': {'value': 0}, 'buffer_paths': []}}, 'buffers': []}) DEBUG:Comm:handle_msg[39b91528e45244ee97b310d6192a831a]({'header': {'date': datetime.datetime(2023, 3, 31, 6, 20, 11, 340000, tzinfo=tzutc()), 'msg_id': 'f529ea36-d0c1-41c8-bdcb-c068fe31c791', 'msg_type': 'comm_msg', 'session': '7943363c-b292-451a-9069-014f9ec0a314', 'username': '3018a1bf-807d-497f-a07a-3f5fdee228f9', 'version': '5.2'}, 'msg_id': 'f529ea36-d0c1-41c8-bdcb-c068fe31c791', 'msg_type': 'comm_msg', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': '39b91528e45244ee97b310d6192a831a', 'data': {'method': 'custom', 'content': {'event': 'click'}}}, 'buffers': []}) E:\aivocals\bin\so-vits-svc-eff-4.0\models\ak193\G_299840.pth True load model(s) from hubert/checkpoint_best_legacy_500.pt DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client. DEBUG:h5py._conv:Creating converter from 7 to 5 DEBUG:h5py._conv:Creating converter from 5 to 7 DEBUG:h5py._conv:Creating converter from 7 to 5 DEBUG:h5py._conv:Creating converter from 5 to 7 INFO:fairseq.tasks.text_to_speech:Please install tensorboardX: pip install tensorboardX INFO:fairseq.tasks.hubert_pretraining:current directory is E:\bin\so-vits-svc-eff-4.0 INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} load INFO:root:Loaded checkpoint 'models\ak193\G_299840.pth' (iteration 3019)

//

It hangs here and does nothing, I think the code execution also stopped, it's just not generating any sound output

cody151 commented 1 year ago

Please see the full output below:

DEBUG:Comm:handle_msg[68e7f381a92946439604b57298656804]({'header': {'date': datetime.datetime(2023, 3, 31, 6, 40, 4, 347000, tzinfo=tzutc()), 'msg_id': '16af36f7-cb3f-4045-af95-03f20185b43e', 'msg_type': 'comm_msg', 'session': '828b6d9b-bf81-4fdf-8d97-d41fb9c65b1f', 'username': 'de5a8c61-3906-423f-8846-cd5c3070726a', 'version': '5.2'}, 'msg_id': '16af36f7-cb3f-4045-af95-03f20185b43e', 'msg_type': 'comm_msg', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': '68e7f381a92946439604b57298656804', 'data': {'method': 'update', 'state': {'value': 0}, 'buffer_paths': []}}, 'buffers': []}) DEBUG:Comm:handle_msg[a05b24d23200458f98f81da994d5ec65]({'header': {'date': datetime.datetime(2023, 3, 31, 6, 40, 36, 534000, tzinfo=tzutc()), 'msg_id': '1182dabc-3114-4fa4-94b0-b46247fc7100', 'msg_type': 'comm_msg', 'session': '828b6d9b-bf81-4fdf-8d97-d41fb9c65b1f', 'username': 'de5a8c61-3906-423f-8846-cd5c3070726a', 'version': '5.2'}, 'msg_id': '1182dabc-3114-4fa4-94b0-b46247fc7100', 'msg_type': 'comm_msg', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': 'a05b24d23200458f98f81da994d5ec65', 'data': {'method': 'custom', 'content': {'event': 'click'}}}, 'buffers': []}) E:\kkals\bin\so-vits-svc-eff-4.0\models\ak193\G_299840.pth True load model(s) from hubert/checkpoint_best_legacy_500.pt DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client. DEBUG:h5py._conv:Creating converter from 7 to 5 DEBUG:h5py._conv:Creating converter from 5 to 7 DEBUG:h5py._conv:Creating converter from 7 to 5 DEBUG:h5py._conv:Creating converter from 5 to 7 INFO:fairseq.tasks.text_to_speech:Please install tensorboardX: pip install tensorboardX INFO:fairseq.tasks.hubert_pretraining:current directory is E:\kkals\bin\so-vits-svc-eff-4.0 INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} load INFO:root:Loaded checkpoint 'models\ak193\G_299840.pth' (iteration 3019) DEBUG:Comm:handle_msg[a05b24d23200458f98f81da994d5ec65]({'header': {'date': datetime.datetime(2023, 3, 31, 6, 42, 56, 58000, tzinfo=tzutc()), 'msg_id': '8d7c9459-9e6f-4f68-a70d-224e09afafed', 'msg_type': 'comm_msg', 'session': '828b6d9b-bf81-4fdf-8d97-d41fb9c65b1f', 'username': 'de5a8c61-3906-423f-8846-cd5c3070726a', 'version': '5.2'}, 'msg_id': '8d7c9459-9e6f-4f68-a70d-224e09afafed', 'msg_type': 'comm_msg', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': 'a05b24d23200458f98f81da994d5ec65', 'data': {'method': 'custom', 'content': {'event': 'click'}}}, 'buffers': []}) E:\kkals\bin\so-vits-svc-eff-4.0\models\ak193\G_299840.pth True load model(s) from hubert/checkpoint_best_legacy_500.pt INFO:fairseq.tasks.hubert_pretraining:current directory is E:\kkals\bin\so-vits-svc-eff-4.0 INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} load INFO:root:Loaded checkpoint 'models\ak193\G_299840.pth' (iteration 3019) DEBUG:Comm:handle_msg[a05b24d23200458f98f81da994d5ec65]({'header': {'date': datetime.datetime(2023, 3, 31, 6, 48, 20, 831000, tzinfo=tzutc()), 'msg_id': '21e078b0-8616-4244-8f48-dfef824115ee', 'msg_type': 'comm_msg', 'session': '828b6d9b-bf81-4fdf-8d97-d41fb9c65b1f', 'username': 'de5a8c61-3906-423f-8846-cd5c3070726a', 'version': '5.2'}, 'msg_id': '21e078b0-8616-4244-8f48-dfef824115ee', 'msg_type': 'comm_msg', 'parent_header': {}, 'metadata': {}, 'content': {'comm_id': 'a05b24d23200458f98f81da994d5ec65', 'data': {'method': 'custom', 'content': {'event': 'click'}}}, 'buffers': []}) E:\kkals\bin\so-vits-svc-eff-4.0\models\ak193\G_299840.pth True load model(s) from hubert/checkpoint_best_legacy_500.pt INFO:fairseq.tasks.hubert_pretraining:current directory is E:\kkals\bin\so-vits-svc-eff-4.0 INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False} INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] 4 + [(512,2,2)] 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False} load INFO:root:Loaded checkpoint 'models\ak193\G_299840.pth' (iteration 3019)

effusiveperiscope commented 1 year ago

Is this with the notebook or the GUI? If you are running the notebook locally with VSCode, could you tell me how you set it up?

cody151 commented 1 year ago

Is this with the notebook or the GUI? If you are running the notebook locally with VSCode, could you tell me how you set it up?

Jupyter notebook, git and all dependencies appear to be installed with correct versions, audio file pasted into so vits folder. I don't know why it just stops, no error message after clicking convert?

effusiveperiscope commented 1 year ago

Without being able to reproduce it on my end I would guess that in the last cell it is unable to find the audio file (under variable "input_filepaths" near the bottom).

cody151 commented 1 year ago

Without being able to reproduce it on my end I would guess that in the last cell it is unable to find the audio file (under variable "input_filepaths" near the bottom).

Thanks for the guidance, turns out locally you have to change the /contents/ folder which I forgot in the code. Solved thanks