fgnt / nara_wpe

Different implementations of "Weighted Prediction Error" for speech dereverberation
MIT License
490 stars 165 forks source link

Some questions about the Online version? #49

Closed imwjhi closed 2 years ago

imwjhi commented 3 years ago

I encountered some problems when using nara_wpe. After reading the paper "NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing", I want to try nara_wpe for de-reverberation, so I learn how to use nara_wpe according to the IPython Notebook in nara_wpe/examples.
And according to the description of Table 1 in the paper, I learned that there is no Block-Online version of the Numpy implemention, while the TensorFlow implemention has the Block-Online version.
When I read the code in IPython Notebook, I saw the Frame-Online version of the code in WPE_Numpy_online.ipynb, but in WPE_Tensorflow_online.ipynb, I found that there is only the Frame-Online version and I can not find the Block-Online version. (I also read the source code of nara_wpe and found a comment in "wpe.py -> class OnlineWPE -> def _get_prediction": # TODO: Only block shift of 1 works. I wonder When block shift=1, is Block-Online equivalent to Frame-Online?) So my first question is: how do I enable the Block-Online version?

Then I tested my code on the testset (Simulated 8ch wavs) with the Offline and Frame-Online version of the Numpy implementation, record the time cost, and calculate the word error rate (WER). My result is as follows:

Unprocess Reverb wav: WER=9.97. Offline: WER=7.10, the RealTimeFactor=1.7. Frame-Online: WER=12.39, the RealTimeFactor=9.87.

The result shows that Frame-Online 's WER and RealTimeFactor are high. So my second question is: Is this result reasonable, if it is reasonable, why the Frame-Online version so slow?

boeddeker commented 3 years ago

Thank you for your interest.

When you search for the tensorflow code, you can find it here: https://github.com/fgnt/nara_wpe/blob/master/nara_wpe/tf_wpe.py . The numpy code is in the file that you mentioned: https://github.com/fgnt/nara_wpe/blob/master/nara_wpe/wpe.py .

So my first question is: how do I enable the Block-Online version?

When you want to use the block online code, I recommend using the tensorflow code, @jheymann85 mainly used this code and it was used for some of his publications. One drawback for the tensorflow code is, that there is a bug with complex numbers in 1.13+, we recommend to use tensorflow 1.12.0.

Unprocess Reverb wav: WER=9.97. Offline: WER=7.10, the RealTimeFactor=1.7.

This sounds reasonable. When you change nara_wpe.wpe.wpe to nara_wpe.wpe.wpe_v8 the RealTimeFactor may get better (v8 is a loopy implementation and I observed that it is often significant faster with a smaller memory footprint).

Frame-Online: WER=12.39,

Actually, I have never tested an online version of WPE. @jheymann85 did the experiments with online WPE, but all of them are done with tensoflow. The numpy version was not heavily tested.

the RealTimeFactor=9.87.

This is easily explained, writing for loops in python is a bad idea, hence the code is slow. The tensorflow code has here the advantage, that it uses compiled code.

imwjhi commented 3 years ago

Thank you for your reply. I found the BlockOnline code in nara_wpe/tf_wpe.py. In order to handle a single wav, I think I should use the block_wpe_step function. Part of the code of my program is as follows:

with tf.Session() as session:
    # Input: X, shape = (F, C, T)
    X_tf = tf.placeholder(tf.complex128, shape=(frequency_bins, channels, None))
    inverse_power_tf = get_power_inverse(X_tf)
    result = block_wpe_step(X_tf, inverse_power_tf, block_length_in_seconds=block_size1, forgetting_factor=forgetting_factor1, fft_shift=frame_shift1)
    feed_dict1 = {X_tf:X.transpose(2, 0, 1)}
    # Output: (F, C, T)
    Y = session.run(result, feed_dict = feed_dict1)
    Y = Y.transpose(1, 2, 0)

Is my code correct?

After that, my program can run successfully and get the result of de-reverberation. The test results show that the speed of BlockOnline is faster than that of FrameOnline.

But I have some new problems: After reading the code for the block_wpe_step function, I realized that there were at least two tunable parameters: block_length_in_seconds and forgetting_factor. I tested these two parameters and found that:

  1. The larger the forgetting_factor, the better the effect of de-reverberation.
  2. In the processed wav results, noise will be introduced between adjacent block (such as the green arrow in the picture). The smaller the block_length_in_seconds (2-> 1.5-> 1-> 0.5-> 0.32), the more obvious the noise.
  3. If there is only noise in the first block, it is easy to burst. (Such as the blue circle in the picture)

Do you think these results reasonable?

20201121161922

boeddeker commented 3 years ago

Yes, those spectrograms don't look good. To debug this, you should maybe start and compare it to the numpy offline WPE implementation. That code is heavily tested and offline WPE is well known to work, while the online or block online may have some problems to be better than doing nothing.

Here a few things you could try:

boeddeker commented 2 years ago

Closing, because missing feedback.