Closed imwjhi closed 2 years ago
Thank you for your interest.
When you search for the tensorflow code, you can find it here: https://github.com/fgnt/nara_wpe/blob/master/nara_wpe/tf_wpe.py . The numpy code is in the file that you mentioned: https://github.com/fgnt/nara_wpe/blob/master/nara_wpe/wpe.py .
So my first question is: how do I enable the Block-Online version?
When you want to use the block online code, I recommend using the tensorflow code, @jheymann85 mainly used this code and it was used for some of his publications. One drawback for the tensorflow code is, that there is a bug with complex numbers in 1.13+, we recommend to use tensorflow 1.12.0.
Unprocess Reverb wav: WER=9.97. Offline: WER=7.10, the RealTimeFactor=1.7.
This sounds reasonable. When you change nara_wpe.wpe.wpe
to nara_wpe.wpe.wpe_v8
the RealTimeFactor may get better (v8 is a loopy implementation and I observed that it is often significant faster with a smaller memory footprint).
Frame-Online: WER=12.39,
Actually, I have never tested an online version of WPE. @jheymann85 did the experiments with online WPE, but all of them are done with tensoflow. The numpy version was not heavily tested.
the RealTimeFactor=9.87.
This is easily explained, writing for loops in python is a bad idea, hence the code is slow. The tensorflow code has here the advantage, that it uses compiled code.
Thank you for your reply. I found the BlockOnline code in nara_wpe/tf_wpe.py. In order to handle a single wav, I think I should use the block_wpe_step function. Part of the code of my program is as follows:
with tf.Session() as session:
# Input: X, shape = (F, C, T)
X_tf = tf.placeholder(tf.complex128, shape=(frequency_bins, channels, None))
inverse_power_tf = get_power_inverse(X_tf)
result = block_wpe_step(X_tf, inverse_power_tf, block_length_in_seconds=block_size1, forgetting_factor=forgetting_factor1, fft_shift=frame_shift1)
feed_dict1 = {X_tf:X.transpose(2, 0, 1)}
# Output: (F, C, T)
Y = session.run(result, feed_dict = feed_dict1)
Y = Y.transpose(1, 2, 0)
Is my code correct?
After that, my program can run successfully and get the result of de-reverberation. The test results show that the speed of BlockOnline is faster than that of FrameOnline.
But I have some new problems: After reading the code for the block_wpe_step function, I realized that there were at least two tunable parameters: block_length_in_seconds and forgetting_factor. I tested these two parameters and found that:
Do you think these results reasonable?
Yes, those spectrograms don't look good. To debug this, you should maybe start and compare it to the numpy offline WPE implementation. That code is heavily tested and offline WPE is well known to work, while the online or block online may have some problems to be better than doing nothing.
Here a few things you could try:
get_power_inverse
is maybe not good enough.
You could try the offline WPE and take from it the estimated inverse power and see, if it is better.(1. - forgetting_factor) * correlation_matrix_tm1 + forgetting_factor * correlation_matrix
to (1. - forgetting_factor) * correlation_matrix_tm1 + correlation_matrix
you get a better behavior at the beginning of the utterance, but this equation is unstable for long sequences (Can be solved with a counter).Closing, because missing feedback.
I encountered some problems when using nara_wpe. After reading the paper "NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing", I want to try nara_wpe for de-reverberation, so I learn how to use nara_wpe according to the IPython Notebook in nara_wpe/examples.
And according to the description of Table 1 in the paper, I learned that there is no Block-Online version of the Numpy implemention, while the TensorFlow implemention has the Block-Online version.
When I read the code in IPython Notebook, I saw the Frame-Online version of the code in WPE_Numpy_online.ipynb, but in WPE_Tensorflow_online.ipynb, I found that there is only the Frame-Online version and I can not find the Block-Online version. (I also read the source code of nara_wpe and found a comment in "wpe.py -> class OnlineWPE -> def _get_prediction": # TODO: Only block shift of 1 works. I wonder When block shift=1, is Block-Online equivalent to Frame-Online?) So my first question is: how do I enable the Block-Online version?
Then I tested my code on the testset (Simulated 8ch wavs) with the Offline and Frame-Online version of the Numpy implementation, record the time cost, and calculate the word error rate (WER). My result is as follows:
Unprocess Reverb wav: WER=9.97. Offline: WER=7.10, the RealTimeFactor=1.7. Frame-Online: WER=12.39, the RealTimeFactor=9.87.
The result shows that Frame-Online 's WER and RealTimeFactor are high. So my second question is: Is this result reasonable, if it is reasonable, why the Frame-Online version so slow?