Single stereo file implementation

djo-koconi commented 4 years ago

Hi, thank you for the great work.

Any hints on how to test the algorithm on a single stereo file? Notebooks show batch editing and I am failing to test it on a single stereo input.

Thank you.

boeddeker commented 4 years ago

Hi, what do you mean with batch editing?

Stereo input is a special case of multiple channels. Often stereo data is saved in a single file. When having more than 2 channels, each channel is usually saved in a single files.

In our examples you will usually find something like

y = np.array([
    soundfile.read(f'path/to/audio/file_channel{i}.wav')
    for i in range(channels)
])  # shape: channels x samples

In case of a single stereo file, you have to be carefully, that the channels are on the last dimension. Do not forget to make a transpose:

y = np.array(soundfile.read('path/to/stereo/file.wav').T)  # shape: channels x samples

djo-koconi commented 4 years ago

Thank you very much for your help. I know my question might be to novice for you. I am just trying to test dereverberation on a single stereo file.

When loading as you suggested file is loaded as:

y = np.array(soundfile.read('wav.wav')).T

y

array([array([ 0.00061035,  0.00033569, -0.0012207 , ..., -0.00296021,
       -0.00274658,  0.00268555]),
       16000], dtype=object)

So trying to find stft outputs data type error: Y = stft(y,512,128)

TypeError: invalid data type for einsum

boeddeker commented 4 years ago

Ah, sorry. My example was wrong. I always forget that soundfile.read returns the audio data and the sample rate (I use a wrapper around soundfile to get rid of the sample rate).

Could you try the following (After reading the file, take only the wav data with the [0])?

y = np.array(soundfile.read('wav.wav')[0]).T

Then the dtype of y should be np.float64 instead of object (When the dtype is np.float64, the dtype is not shown in the representer).

>>> y
array([0.00061035,  0.00033569, -0.0012207 , ..., -0.00296021,
       -0.00274658,  0.00268555]))

djo-koconi commented 4 years ago

Thank you very much. I misunderstood that the array should consist some additional data.

Could you recommend some parameters for stereo file testing?

I ran the wpe function on Y = stft(y,512,128)with default parameters and it is running really long time.

Finally istft outputted silence without any audio.

boeddeker commented 4 years ago

Thank you very much. I misunderstood that the array should consist some additional data.

The array should contain only the data and no meta information like the sample rate.

Finally istft outputted silence without any audio.

This is a sign that the input shape to WPE was wrong. When the channels are swapped with the frequency or time dimension, the output will get close to zero and WPE gets very slow.

The input shape to WPE should be (..., D, T) where D is the number of channels and T the number of STFT frames. In your case, the shape should be (257, 2, T).

djo-koconi commented 4 years ago

Thank you so much Christoph.

It must be that I am doing the reshape wrong. Trying to run it to a mono file now.

So, my code is as following:

y = np.array(sf.read('reverbereted-wav.wav')[0]).T
Y = stft(y,512,128)
Yr=Y.reshape(257,1,len(Y))
Z=wpe(Yr)
z=istft(Z.T,512,128)
sf.write...

Any hint would be highly appreciated.

boeddeker commented 4 years ago

Be careful when you use reshape. It is very rare that you want to do a reshape. Here you want to make an array transpose.

Here your fixed example with many asserts:

import numpy as np
import soundfile as sf
from nara_wpe.utils import stft, istft
from nara_wpe.wpe import wpe

y = np.array([sf.read('reverbereted-wav.wav')[0]])
assert y.ndim == 2, y.shape
assert y.shape[0] == 1, y.shape
Y = stft(y, 512,128)
assert Y.ndim == 3, Y.shape
assert Y.shape[-1] == 257, Y.shape  # frequencies
assert Y.shape[0] == 1, Y.shape  # channels
Yr = Y.transpose(2, 0, 1)
assert Yr.ndim == 3, Yr.shape
assert Yr.shape[0] == 257, Yr.shape  # frequencies
assert Yr.shape[-2] == 1, Yr.shape  # channels
Z = wpe(Yr)
Z = Z.transpose(1, 2, 0)
assert Z.ndim == 3, Z.shape
assert Z.shape[-1] == 257, Z.shape  # frequencies
assert Z.shape[0] == 1, Z.shape  # channels
z = istft(Z, 512, 128)

djo-koconi commented 4 years ago

Thank you so much for the code correction Christoph.

After testing on few reverberated mono files, I find that the algorithm's output is identical to the input. Reverberation is not being reduced even it is clearly present.

Could this be fixed by some parameter tuning? I tried increasing number of iterations, but output is without any difference. Inputs are 16KHz mono files.

boeddeker commented 4 years ago

The tuning of the parameters depend on the actual data. We selected default values that worked fine in our case.

Common behavior of WPE:

WPE is more effective with more channels
You can increase the number of parameters (i.e. taps) when you have fewer channels

djo-koconi commented 4 years ago

Thank you so much.

Yes, the stereo input results are noticeably better.

Kind regards

fgnt / nara_wpe

Single stereo file implementation #42