Some questions about the Online version?

I encountered some problems when using nara_wpe. After reading the paper "NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing", I want to try nara_wpe for de-reverberation, so I learn how to use nara_wpe according to the IPython Notebook in nara_wpe/examples.
And according to the description of Table 1 in the paper, I learned that there is no Block-Online version of the Numpy implemention, while the TensorFlow implemention has the Block-Online version.
When I read the code in IPython Notebook, I saw the Frame-Online version of the code in WPE_Numpy_online.ipynb, but in WPE_Tensorflow_online.ipynb, I found that there is only the Frame-Online version and I can not find the Block-Online version. (I also read the source code of nara_wpe and found a comment in "wpe.py -> class OnlineWPE -> def _get_prediction": # TODO: Only block shift of 1 works. I wonder When block shift=1, is Block-Online equivalent to Frame-Online?) So my first question is: how do I enable the Block-Online version?

Then I tested my code on the testset (Simulated 8ch wavs) with the Offline and Frame-Online version of the Numpy implementation, record the time cost, and calculate the word error rate (WER). My result is as follows:

Unprocess Reverb wav: WER=9.97. Offline: WER=7.10, the RealTimeFactor=1.7. Frame-Online: WER=12.39, the RealTimeFactor=9.87.

The result shows that Frame-Online 's WER and RealTimeFactor are high. So my second question is: Is this result reasonable, if it is reasonable, why the Frame-Online version so slow?

Thank you for your interest.

When you search for the tensorflow code, you can find it here: https://github.com/fgnt/nara_wpe/blob/master/nara_wpe/tf_wpe.py . The numpy code is in the file that you mentioned: https://github.com/fgnt/nara_wpe/blob/master/nara_wpe/wpe.py .

So my first question is: how do I enable the Block-Online version?

When you want to use the block online code, I recommend using the tensorflow code, @jheymann85 mainly used this code and it was used for some of his publications. One drawback for the tensorflow code is, that there is a bug with complex numbers in 1.13+, we recommend to use tensorflow 1.12.0.

Unprocess Reverb wav: WER=9.97. Offline: WER=7.10, the RealTimeFactor=1.7.

This sounds reasonable. When you change nara_wpe.wpe.wpe to nara_wpe.wpe.wpe_v8 the RealTimeFactor may get better (v8 is a loopy implementation and I observed that it is often significant faster with a smaller memory footprint).

Frame-Online: WER=12.39,

Actually, I have never tested an online version of WPE. @jheymann85 did the experiments with online WPE, but all of them are done with tensoflow. The numpy version was not heavily tested.

the RealTimeFactor=9.87.

This is easily explained, writing for loops in python is a bad idea, hence the code is slow. The tensorflow code has here the advantage, that it uses compiled code.

Thank you for your reply. I found the BlockOnline code in nara_wpe/tf_wpe.py. In order to handle a single wav, I think I should use the block_wpe_step function. Part of the code of my program is as follows:

with tf.Session() as session:
    # Input: X, shape = (F, C, T)
    X_tf = tf.placeholder(tf.complex128, shape=(frequency_bins, channels, None))
    inverse_power_tf = get_power_inverse(X_tf)
    result = block_wpe_step(X_tf, inverse_power_tf, block_length_in_seconds=block_size1, forgetting_factor=forgetting_factor1, fft_shift=frame_shift1)
    feed_dict1 = {X_tf:X.transpose(2, 0, 1)}
    # Output: (F, C, T)
    Y = session.run(result, feed_dict = feed_dict1)
    Y = Y.transpose(1, 2, 0)

Is my code correct?

After that, my program can run successfully and get the result of de-reverberation. The test results show that the speed of BlockOnline is faster than that of FrameOnline.

But I have some new problems: After reading the code for the block_wpe_step function, I realized that there were at least two tunable parameters: block_length_in_seconds and forgetting_factor. I tested these two parameters and found that:

The larger the forgetting_factor, the better the effect of de-reverberation.
In the processed wav results, noise will be introduced between adjacent block (such as the green arrow in the picture). The smaller the block_length_in_seconds (2-> 1.5-> 1-> 0.5-> 0.32), the more obvious the noise.
If there is only noise in the first block, it is easy to burst. (Such as the blue circle in the picture)

Do you think these results reasonable?

20201121161922

Yes, those spectrograms don't look good. To debug this, you should maybe start and compare it to the numpy offline WPE implementation. That code is heavily tested and offline WPE is well known to work, while the online or block online may have some problems to be better than doing nothing.

Here a few things you could try:

I think it was in tensorflow 1.13, where the tensorflow people introduced a bug with complex numbers. Can you check your version? Tensorflow 1.12.0 should work.
Most online experiments were done with DNN-WPE. So a neural network estimates the inverse power. get_power_inverse is maybe not good enough. You could try the offline WPE and take from it the estimated inverse power and see, if it is better.
Online algorithms need a burn in time. Especially for WPE this is important, because the objective of WPE is to produce the zero signal. This is probably, what you observe in the bottom left image
The default forgetting factor is for DNN-WPE. So it is most likely to large for your case. But using a small value adds too much value to the first estimate. When you change (1. - forgetting_factor) * correlation_matrix_tm1 + forgetting_factor * correlation_matrix to (1. - forgetting_factor) * correlation_matrix_tm1 + correlation_matrix you get a better behavior at the beginning of the utterance, but this equation is unstable for long sequences (Can be solved with a counter).
The introduced noise in the middle of the signal comes from the large forgetting factor. WPE overfits to the silence and produces zeros. Then the acoustic conditions change and WPE produces noise. So WPE adapts to fast to the new situation.

Closing, because missing feedback.

fgnt / nara_wpe

Some questions about the Online version? #49