CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
51.54k stars 8.64k forks source link

i get error Apply the preprocessing: normalize volume and shorten long silences #1217

Open bobwatcherx opened 1 year ago

bobwatcherx commented 1 year ago

I tried realtime voice cloning in colab. and provide sound sample files from my google drive. and at the end of the process. i get an error like this

this file https://colab.research.google.com/drive/1F_WiadJ_ibYITjoJHev3BhoKIq6CBMd9?authuser=1

TypeError                                 Traceback (most recent call last)

[<ipython-input-14-0e7ddba313e5>](https://localhost:8080/#) in <cell line: 2>()
      1 in_fpath = Path("/content/gdrive/MyDrive/wowo.wav")
----> 2 reprocessed_wav = encoder.preprocess_wav(in_fpath)
      3 original_wav, sampling_rate = librosa.load(in_fpath)
      4 preprocessed_wav = encoder.preprocess_wav(original_wav, sampling_rate)
      5 embed = encoder.embed_utterance(preprocessed_wav)

[/content/Real-Time-Voice-Cloning/encoder/audio.py](https://localhost:8080/#) in preprocess_wav(fpath_or_wav, source_sr, normalize, trim_silence)
     40     # Resample the wav if needed
     41     if source_sr is not None and source_sr != sampling_rate:
---> 42         wav = librosa.resample(wav, source_sr, sampling_rate)
     43 
     44     # Apply the preprocessing: normalize volume and shorten long silences

TypeError: resample() takes 1 positional argument but 3 were given
raws84 commented 1 year ago

I think its a librosa version issue. Install the following in colab before running the code and it should work !pip install unidecode !pip install webrtcvad !pip install librosa==0.8.1

theClawsmos commented 12 months ago

Hi bobwatcherx. According to my evaluation of the error,

The error message suggests that there is an issue with the usage of the resample() function from the library, probably librosa. It seems that you are passing more arguments than the function wants/needs (similar to onibaken's issue #1234).

According to my experience, to resolve this error, you need to ensure that you are passing the correct arguments (and # of arguments) to the resample() function. Based on the traceback, the problematic line is in the preprocess_wav() function in the audio.py file.

In order to help you further, I would need to see the code snippet from the file where the preprocess_wav() function is defined, most likely audio.py. If you can provide that code, it will be very beneficial.

Cheers,

theClawsmos

ampriHYP commented 7 months ago

hi the CODE is this; im have the same error

@title Record or Upload

@markdown * Either record audio from microphone or upload audio from file (.mp3 or .wav)

!pip install --upgrade librosa SAMPLE_RATE = 22050 record_or_upload = "Upload (.mp3 or .wav)" #@param ["Record", "Upload (.mp3 or .wav)"] record_seconds = 10#@param {type:"number", min:1, max:10, step:1}

embedding = None def _compute_embedding(audio): display(Audio(audio, rate=SAMPLE_RATE, autoplay=True)) global embedding embedding = None embedding = encoder.embed_utterance(encoder.preprocess_wav(audio, SAMPLE_RATE)) #in this line it is the error def _record_audio(b): clear_output() audio = record_audio(record_seconds, sample_rate=SAMPLE_RATE) _compute_embedding(audio) def _upload_audio(b): clear_output() audio = upload_audio(sample_rate=SAMPLE_RATE) _compute_embedding(audio)

if record_or_upload == "Record": button = widgets.Button(description="Record Your Voice") button.on_click(_record_audio) display(button) else:

button = widgets.Button(description="Upload Voice File")

button.on_click(_upload_audio)

_upload_audio("")

ampriHYP commented 7 months ago

and the fuction preprocess.waw in audio.py:

def preprocess_wav(fpath_or_wav: Union[str, Path, np.ndarray], source_sr: Optional[int] = None, normalize: Optional[bool] = True, trim_silence: Optional[bool] = True): """ Applies the preprocessing operations used in training the Speaker Encoder to a waveform either on disk or in memory. The waveform will be resampled to match the data hyperparameters.

:param fpath_or_wav: either a filepath to an audio file (many extensions are supported, not 
just .wav), either the waveform as a numpy array of floats.
:param source_sr: if passing an audio waveform, the sampling rate of the waveform before 
preprocessing. After preprocessing, the waveform's sampling rate will match the data 
hyperparameters. If passing a filepath, the sampling rate will be automatically detected and 
this argument will be ignored.
"""
# Load the wav from disk if needed
if isinstance(fpath_or_wav, str) or isinstance(fpath_or_wav, Path):
    wav, source_sr = librosa.load(str(fpath_or_wav), sr=None)
else:
    wav = fpath_or_wav

# Resample the wav if needed
if source_sr is not None and source_sr != sampling_rate:
    wav = librosa.resample(wav, source_sr, sampling_rate)

# Apply the preprocessing: normalize volume and shorten long silences 
if normalize:
    wav = normalize_volume(wav, audio_norm_target_dBFS, increase_only=True)
if webrtcvad and trim_silence:
    wav = trim_long_silences(wav)

return wav