Closed go2chayan closed 4 years ago
Hi, I think the librosa is a problem, they seem to have updated the internal functions.
I would recommend to change to something else for loading data, e.g. scipy.
Seems like 2.7.15, Keras 2.2.4, Tensorflow 1.8.0, scikit-learn==0.16.1, and librosa==0.3.1 seems to work; hopefully its the same as the used in the paper
I think switching away from librosa to scipy can be risky, as there are differences in the way it can convert them to numpy arrays depending on the format of the audio (32 bit PCM etc.)
ah but the librosa dataloader is incredibly slow for some reason here; taking about 1 second per short .wav clip
def load_wav(vid_path, sr, mode='train'):
t1=timelib.time()
#print("start loading wav")
#print(sr)
#wav, sr_ret = librosa.load(vid_path, sr=sr)
sr_ret, old_audio = scipy.io.wavfile.read(vid_path)
NEW_SAMPLERATE = 16000
if sr_ret != sr:
duration = old_audio.shape[0] / sr_ret
time_old = np.linspace(0, duration, old_audio.shape[0])
time_new = np.linspace(0, duration,
int(old_audio.shape[0] * sr / sr_ret))
interpolator = interpolate.interp1d(time_old, old_audio.T)
wav = interpolator(time_new).T
#assert sr_ret == 16000, "we need same samplerate as librosa originally provided but is: " +str(sr_ret)
#print("finish loading wav", timelib.time()-t1)
if mode == 'train':
extended_wav = np.append(wav, wav)
if np.random.random() < 0.3:
extended_wav = extended_wav[::-1]
return extended_wav
else:
extended_wav = np.append(wav, wav[::-1])
return extended_wav
Seems to fix this making it 0.003 seconds to load, and ensure the sample rate is still == "sr" like librosa would
Unfortunately this had issues reading my particular WAV files, saying it could not read certain chunks, so I then tried the following that seemed to work
import soundfile as sf
def load_data(path, win_length=400, sr=16000, hop_length=160, n_fft=512, spec_len=250, mode='train'):
#print("starting loading a datum")
#t1 = timelib.time()
wav = load_wav(path, sr=sr, mode=mode)
linear_spect = lin_spectogram_from_wav(wav, hop_length, win_length, n_fft)
mag, _ = librosa.magphase(linear_spect) # magnitude
mag_T = mag.T
freq, time = mag_T.shape
if mode == 'train':
if time > spec_len:
randtime = np.random.randint(0, time-spec_len)
spec_mag = mag_T[:, randtime:randtime+spec_len]
else:
spec_mag = np.pad(mag_T, ((0, 0), (0, spec_len - time)), 'constant')
else:
spec_mag = mag_T
# preprocessing, subtract mean, divided by time-wise var
mu = np.mean(spec_mag, 0, keepdims=True)
std = np.std(spec_mag, 0, keepdims=True)
#print("finished loading a datum", timelib.time() - t1)
return (spec_mag - mu) / (std + 1e-5)
But this NAN encountered division issues on the interpolation step so ended up just accepting variable samples rates with a simple wav, sr_ret = sf.read(vid_path)
and hoping it doesn't break anythig
I found resampling using an fft method is too time consuming. I finally used scipy.waveread followed by resample_ploy for a faster processing. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample_poly.html
I found resampling using an fft method is too time consuming. I finally used scipy.waveread followed by resample_ploy for a faster processing. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.resample_poly.html
Interesting! Please could you share the code snippet for using resample_poly for audio resamplng?
While trying to run the code, I used librosa 0.4.2 because that's the latest one matching with other specified dependencies (Python 2.7.15, Keras 2.2.4, Tensorflow 1.8.0). But it is showing the following error:
So, I'm wondering if I'm installing the correct librosa version or if there is anything that I didn't get correctly. Would you please help?