Not working out of the box or from source

SirBitesalot commented 10 months ago

The program is not working. I tried with installed version and from source. Out of the box i get the following error:

Traceback (most recent call last):
  File "F:\rvcservice\RVC-inference\inferrvc\service.py", line 19, in <module>
    paudio1=tester(aud,f0_up_key=6,output_device='cpu',output_volume=RVC.MATCH_ORIGINAL,index_rate=.75)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\rvcservice\RVC-inference\inferrvc\modules.py", line 303, in __call__
    return self.run(audio,f0_up_key,f0_method,index_rate,filter_radius,protect,output_device,output_volume,f0_spec)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\rvcservice\RVC-inference\inferrvc\modules.py", line 379, in run
    lufsout=self._LOUDOUTPUT(audio_opt.unsqueeze(0))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\torchaudio\transforms\_transforms.py", line 1334, in forward
    return F.loudness(wavefrom, self.sample_rate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\torchaudio\functional\functional.py", line 1686, in loudness
    gate_samples = int(round(gate_duration * sample_rate))
                             ~~~~~~~~~~~~~~^~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

If i change output_volume from RVC.MATCH_ORIGINAL to RVC.NO_CHANGE i get:

Traceback (most recent call last):
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 1328, in _check_dtype
    return _ffi_types[dtype]
           ~~~~~~~~~~^^^^^^^
KeyError: 'float16'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\rvcservice\RVC-inference\inferrvc\service.py", line 23, in <module>
    sf.write('result.wav',paudio1,44100)
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 345, in write
    f.write(data)
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 1020, in write
    written = self._array_io('write', data, len(data))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 1341, in _array_io
    ctype = self._check_dtype(array.dtype.name)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 1330, in _check_dtype
    raise ValueError("dtype must be one of {0!r} and not {1!r}".format(
ValueError: dtype must be one of ['float32', 'float64', 'int16', 'int32'] and not 'float16'

If I change the decice from cpu to cuda i get:

(rvcservice) F:\rvcservice\RVC-inference\inferrvc>python service.py
2023-12-28 01:41:32 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-12-28 01:41:33 | INFO | configs.config | Found GPU NVIDIA GeForce RTX 4090
2023-12-28 01:41:33 | INFO | configs.config | Selecting device:cuda:0, is_half:True
Model: test, Index: test
Paths models/test.pth indexes/test.index
2023-12-28 01:41:34 | INFO | fairseq.tasks.hubert_pretraining | current directory is F:\rvcservice\RVC-inference\inferrvc
2023-12-28 01:41:34 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-12-28 01:41:34 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
2023-12-28 01:41:36 | INFO | pipeline | Loading rmvpe model,C:\Users\SirBitesalot\.cache\huggingface\hub\models--lj1995--VoiceConversionWebUI\snapshots\88e42f0cb3662ddc0dd263a4814206ce96d53214\rmvpe.pt
Traceback (most recent call last):
  File "F:\rvcservice\RVC-inference\inferrvc\service.py", line 23, in <module>
    sf.write('result.wav',paudio1,44100)
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 338, in write
    data = np.asarray(data)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\SirBitesalot\AppData\Roaming\Python\Python311\site-packages\torch\_tensor.py", line 970, in __array__
    return self.numpy()
           ^^^^^^^^^^^^
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I use python 3.11.

Can you provide me with detailed information how this is to be used?

CircuitCM commented 10 months ago

For error 1 and 2 that is because I didn't have a chance to run tests on CPU only, just cuda so the fact it got that far without an issue is good. Everything should work correctly if you use the gpu and not CPU only for now.

For error 2 it will work if you cast the audio array to float32 torch or numpy type.

For error 3 it will work if you paudio1.to('cpu') or set output_device='cpu', that output_device is only to specify where the tensor will go, not what will run the models.

It might also be helpful to make a separate project, and use this library as a dependency instead of as a fork. There probably won't be any issues with where your service.py file is located, but I only tested running the code in a file outside of the inferrvc module. If you'd like the import to be easier (eg putting it in a requirements.txt) I can put it on pypi.

lmk if any of these help.

SirBitesalot commented 10 months ago

Hey thank you very much for your response casting the audio array to float32 does indeed enable me to save the result. It seems that i now have issues with mismatched sample rates as the result differ quite drastically in the expected length and pitch. Changing the Sample Rate it is saved as does not solve this. I will see if I can figure out what is happening. If I get it working on CPU i will update you.

CircuitCM commented 10 months ago

You can have the file path as input instead of giving it the tensor and it will get the right sample rate for the original audio. If you want to give it the tensor instead do paudio1.frequency=samplerate that will add a frequency parameter to the tensors dict, the model will read that and match sample rates correctly.

I see it’s my fault actually I didn’t have it in the documentation. After calling: aud,sr = load_torchaudio('path/to/audio.wav') Do aud.frequency=sr Then it should work. In the next update I will make that automatic in the load_torchaudio.

CircuitCM commented 10 months ago

Updated load_torchaudio, the sample rate mismatch should be fixed now without changing your code. Just reinstall with pip.

SirBitesalot commented 9 months ago

Hey sorry for the late reply the load_torchaudio update did indeed fix the Samplerate issue. I can now sucessfully run the inference if I use RVC.NO_CHANGE and disable indexing. With indexing enabled I get this error: `H:\anaconda3\envs\rvcservice\Lib\site-packages\inferrvc\pipeline.py:219: RuntimeWarning: invalid value encountered in divide weight /= weight.sum(axis=1, keepdims=True) Traceback (most recent call last): File "F:\rvcservice\test.py", line 16, in paudio1=genji(aud,f0_up_key=0,output_device='cpu',output_volume=RVC.NO_CHANGE,index_rate=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\anaconda3\envs\rvcservice\Lib\site-packages\inferrvc\modules.py", line 303, in call return self.run(audio,f0_up_key,f0_method,index_rate,filter_radius,protect,output_device,output_volume,f0_spec) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\anaconda3\envs\rvcservice\Lib\site-packages\inferrvc\modules.py", line 359, in run audio_opt = self.pipeline.pipeline( ^^^^^^^^^^^^^^^^^^^^^^^ File "H:\anaconda3\envs\rvcservice\Lib\site-packages\inferrvc\pipeline.py", line 380, in pipeline self.vc( File "H:\anaconda3\envs\rvcservice\Lib\site-packages\inferrvc\pipeline.py", line 220, in vc npy = np.sum(big_npy[ix] * np.expand_dims(weight, axis=2), axis=1)


IndexError: index -1 is out of bounds for axis 0 with size 0`

CircuitCM commented 9 months ago

Just to make sure did you check that genji.index_path is correct? I'll take a look at this thanks.

CircuitCM / RVC-inference

Not working out of the box or from source #1