RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
24.49k stars 3.61k forks source link

gui_v1 killed when trying to use Pulseaudio Nodes #1007

Open veprogames opened 1 year ago

veprogames commented 1 year ago

Description

Platform

6.4.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 11 Aug 2023 11:03:36 +0000 x86_64 unknown unknown GNU/Linux

Log

$ python gui_v1.py        
Found GPU NVIDIA GeForce RTX 2060 SUPER
input device:15:ROCCAT Khan AIMO Analog Stereo (JACK Audio Connection Kit)
output device:17:myvirtualmic Audio/Source/Sink/Virtual sink (JACK Audio Connection Kit)
using_cuda:True
index search enabled
2023-08-14 19:15:19 | INFO | fairseq.tasks.hubert_pretraining | current directory is [...]
2023-08-14 19:15:19 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-08-14 19:15:19 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
gin_channels: 256 self.spk_embed_dim: 109
<All keys matched successfully>
Audio block passed.
Audio block passed.
Audio block passed.
zsh: killed     python gui_v1.py
veprogames commented 1 year ago

Not killed when using dtype="int32" instead of "float32" @ https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/71116f07e91f7874f755a75e1b37754b2086bd70/gui_v1.py#L430

Didn't get output so far

veprogames commented 1 year ago

Ok, I've done more testing and it seems like it's more conventional in Linux that int format instead of float format. That seems to be done though the change of dtype above, but here casting would be required, as it expects float: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/71116f07e91f7874f755a75e1b37754b2086bd70/gui_v1.py#L514

My setup is modprobing snd-aloop to create ALSA loopback devices.

e. g. Audacity recording errors because it expects an int stream, I can arecord though because i can specify it to expect a float stream.

veprogames commented 1 year ago

I might have found a way!

Try this:

$ sudo modprobe snd-aloop
$ pactl load-module module-remap-source source_name="MyIntMicS" master="hw:3,1" format="s32le" remix=no source_properties=device.name='Microphone In (int)'
$ pactl load-module module-virtual-source source_name="MyIntMic" master="MyIntMicS.monitor" ##maybe can be omitted

replace the 3 in hw:3,1 above with your ALSA loopback card number

In gui_v1, select your main device as input (eg hw:2,0) and hw:x,0 as output where x is your ALSA loopback device card.

veprogames commented 1 year ago

Seems to have stopped working :/ arecord -D hw:3,1 -c2 -f FLOAT_LE still works though. But pactl lists physical microphone state as (null) instead of RUNNING when conversion is active -> no sound. Problem on my end

Also attempted this:

$ pactl load-module module-remap-source source_name="micasfloat" master='alsa_input.usb-ROCCAT_ROCCAT_Khan_AIMO-00.analog-stereo' format="float32le" remix=no source_properties=device.description='MicrophoneAsFloat'

$ pactl load-module module-null-sink sink_name="rvcsink" format="float32le" sink_properties=device.description="RVCSink"
$ python gui_v1.py                                                                                                                                     
Found GPU NVIDIA GeForce RTX 2060 SUPER
input device:19:MicrophoneAsFloat (JACK Audio Connection Kit)
output device:21:RVCSink (JACK Audio Connection Kit)
using_cuda:True
index search enabled
2023-08-18 12:47:58 | INFO | fairseq.tasks.hubert_pretraining | current directory is [...]
2023-08-18 12:47:58 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-08-18 12:47:58 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
gin_channels: 256 self.spk_embed_dim: 109
<All keys matched successfully>
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
Audio block passed.
zsh: killed     python gui_v1.py
$ pactl list short | grep float32le
536870913   module-remap-source source_name=micasfloat master=alsa_input.usb-ROCCAT_ROCCAT_Khan_AIMO-00.analog-stereo format=float32le remix=no source_properties=device.description=MicrophoneAsFloat
536870914   module-null-sink    sink_name=rvcsink format=float32le sink_properties=device.description=RVCSink   
92  rvcsink PipeWire    float32le 2ch 48000Hz   SUSPENDED
78  micasfloat  PipeWire    float32le 2ch 48000Hz   SUSPENDED
92  rvcsink.monitor PipeWire    float32le 2ch 48000Hz   SUSPENDED
79  53  -   PipeWire    float32le 2ch 48000Hz