fgnt / nara_wpe

Different implementations of "Weighted Prediction Error" for speech dereverberation
MIT License
494 stars 164 forks source link

Add iterations to frame-online / comparison with block-online #38

Closed Sciss closed 5 years ago

Sciss commented 5 years ago

Hi there, I finally got the notebooks working, thank you very much for this great repository. I have two related questions:

LukasDrude commented 5 years ago

I think I need a bit more context to really answer this question. First of all, once you iterate more than once over the entire utterance (or audio file) you should not use the frame-online approach. That one is (in theory) only meaningful if you do not want to look into the future or into the past (i.e. when you have maximum latency constraints).

If you have files of limited length (e.g. below 20 s) you may simply want to use the offline (batch) mode.

Do you need to use TensorFlow? Otherwise it might be easier to use the Numpy implementation.

If you have very long audio files in mind, I would either consider the block-online approach as mentioned in the paper or define a variant of that.

But in summary, I may need more context information.

Sciss commented 5 years ago

Sorry, here is bit more context: I'm translating the algorithm into my own signal processing framework FScape (in plain Scala). This is designed so everything can work with arbitrary length input, e.g. an hour of sound, so I want to avoid offline requirement where possible (not everything may fit into memory). Therefore, I like the frame-step algorithm, and I just wonder since it's not straight forward to reintroduce the multiple iterations you run in offline / batch mode, if this wouldn't be more or less the same as cascading the algorithm. So if iterations = 3, would that not be approximated by OnlineWPE(OnlineWPE(OnlineWPE(in, params))) ?

boeddeker commented 5 years ago

I do not know if someone has tried to use more than one iteration for online WPE. You have indirectly some iterations when you process frame by frame. Because of that one iteration is used.

Calling OnlineWPE(OnlineWPE(OnlineWPE(in, params))) is different to setting iterations to 3. In this setting, you convolve with multiple filters. If I remember correctly, I once tried this with offline WPE and the result was better when I didn't repeat WPE. One problem is, that the objective of WPE is to produce the zero signal. The filter design limits WPE and it then does as side effect dereverberation. When you repeat WPE infinite times, you will get the zero signal.

When you get problems with a too high memory consumption, you could try to apply WPE independent on each frequency.

Sciss commented 5 years ago

Thank you very much for the explanation! I'll report once I have my implementation working :)