KAIST-MACLab / PyTSMod

An open-source Python library for audio time-scale modification.
GNU General Public License v3.0
191 stars 27 forks source link

[BUG] tdpsola does not work properly for low beta values #33

Open Looki2000 opened 1 month ago

Looki2000 commented 1 month ago

Describe the bug When changing pitch of a voice with tsm.tdpsola by low beta factor, pitch stays the same and you can hear clicking artifacts. There are pitch shifting plugins that use TD-PSOLA and allow for even lower pitch changes, so I don't think this is a limitation of the algorithm.

To Reproduce Code to reproduce the behavior:

import numpy as np
import librosa
import soundfile as sf
import matplotlib.pyplot as plt
import pytsmod as tsm

n_fft = 1024
hop_length_factor = 4

file_path = "audio.flac"

print("Loading audio file...")
audio, sr = librosa.load(file_path, sr=None, mono=True)
print(sr)

hop_length = n_fft // hop_length_factor

print("pyin")
f0, _, _= librosa.pyin(
    audio,
    sr=sr,
    fmin=librosa.note_to_hz("C2"),
    fmax=librosa.note_to_hz("C7"),
    frame_length=n_fft,
    hop_length=hop_length,
)

mask = np.isnan(f0)

# linearly interpolate pitch in place of nans
f0[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), f0[~mask])

audio_stft = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length)

f0_stft = f0_stft = f0 * n_fft/sr

# plot spectrogram and f0
spect = librosa.amplitude_to_db(np.abs(audio_stft), ref=np.max)
fig, ax = plt.subplots()
img = librosa.display.specshow(spect, x_axis="time", ax=ax, sr=sr, hop_length=hop_length)
fig.colorbar(img, ax=ax, format="%2.f")

ax.plot(librosa.times_like(f0_stft, sr=sr, hop_length=hop_length), f0_stft, label="f0", color="cyan")
plt.show()

audio = tsm.tdpsola(audio, sr, f0, beta=0.5, p_hop_size=hop_length, p_win_size=n_fft)

sf.write("tdpsola test.wav", audio, sr)

Desktop:

seyong92 commented 1 month ago

Thank you for your report. PSOLA in pytsmod is re-implementation of the MATLAB implementation in the DAFX Digital Audio Effects book.

It isn't easy to improve the algorithm without any references, so if you have any cases with the code, please share them with me.

I cannot promise that it can be fixed in the near future, but I will check it.

Looki2000 commented 1 month ago

I think there is exactly the sample problem as with this implementation: https://dsp.stackexchange.com/questions/61687/problem-using-pitch-shifting-with-td-psola-and-formant-preservation PSOLA should not try to fill gaps for low pitches. It should leave blank spaces. It sounds very unintuitive, but it's just a limitation of the algorithm itself. I don't have any reference code because I haven't found any correct implementation yet lol