Open bobwatcherx opened 1 year ago
I think its a librosa version issue. Install the following in colab before running the code and it should work !pip install unidecode !pip install webrtcvad !pip install librosa==0.8.1
Hi bobwatcherx. According to my evaluation of the error,
The error message suggests that there is an issue with the usage of the resample() function from the library, probably librosa. It seems that you are passing more arguments than the function wants/needs (similar to onibaken's issue #1234).
According to my experience, to resolve this error, you need to ensure that you are passing the correct arguments (and # of arguments) to the resample() function. Based on the traceback, the problematic line is in the preprocess_wav() function in the audio.py file.
In order to help you further, I would need to see the code snippet from the file where the preprocess_wav() function is defined, most likely audio.py. If you can provide that code, it will be very beneficial.
Cheers,
theClawsmos
hi the CODE is this; im have the same error
!pip install --upgrade librosa SAMPLE_RATE = 22050 record_or_upload = "Upload (.mp3 or .wav)" #@param ["Record", "Upload (.mp3 or .wav)"] record_seconds = 10#@param {type:"number", min:1, max:10, step:1}
embedding = None def _compute_embedding(audio): display(Audio(audio, rate=SAMPLE_RATE, autoplay=True)) global embedding embedding = None embedding = encoder.embed_utterance(encoder.preprocess_wav(audio, SAMPLE_RATE)) #in this line it is the error def _record_audio(b): clear_output() audio = record_audio(record_seconds, sample_rate=SAMPLE_RATE) _compute_embedding(audio) def _upload_audio(b): clear_output() audio = upload_audio(sample_rate=SAMPLE_RATE) _compute_embedding(audio)
if record_or_upload == "Record": button = widgets.Button(description="Record Your Voice") button.on_click(_record_audio) display(button) else:
_upload_audio("")
and the fuction preprocess.waw in audio.py:
def preprocess_wav(fpath_or_wav: Union[str, Path, np.ndarray], source_sr: Optional[int] = None, normalize: Optional[bool] = True, trim_silence: Optional[bool] = True): """ Applies the preprocessing operations used in training the Speaker Encoder to a waveform either on disk or in memory. The waveform will be resampled to match the data hyperparameters.
:param fpath_or_wav: either a filepath to an audio file (many extensions are supported, not
just .wav), either the waveform as a numpy array of floats.
:param source_sr: if passing an audio waveform, the sampling rate of the waveform before
preprocessing. After preprocessing, the waveform's sampling rate will match the data
hyperparameters. If passing a filepath, the sampling rate will be automatically detected and
this argument will be ignored.
"""
# Load the wav from disk if needed
if isinstance(fpath_or_wav, str) or isinstance(fpath_or_wav, Path):
wav, source_sr = librosa.load(str(fpath_or_wav), sr=None)
else:
wav = fpath_or_wav
# Resample the wav if needed
if source_sr is not None and source_sr != sampling_rate:
wav = librosa.resample(wav, source_sr, sampling_rate)
# Apply the preprocessing: normalize volume and shorten long silences
if normalize:
wav = normalize_volume(wav, audio_norm_target_dBFS, increase_only=True)
if webrtcvad and trim_silence:
wav = trim_long_silences(wav)
return wav
I tried realtime voice cloning in colab. and provide sound sample files from my google drive. and at the end of the process. i get an error like this
this file https://colab.research.google.com/drive/1F_WiadJ_ibYITjoJHev3BhoKIq6CBMd9?authuser=1