fgnt / pb_chime5

Speech enhancement system for the CHiME-5 dinner party scenario
MIT License
107 stars 35 forks source link

A question in class GSS in core_chime6.py #21

Open elissopp opened 2 years ago

elissopp commented 2 years ago

In line 198 of file core_chime6.py, method predict is used with a parameter source_activity_mask

affiliation = cur.predict(
                   Obs.T[f, ...],
                   source_activity_mask=source_active_mask[f, ..., :T]
               )

But in the definition of the object cur (as well as class CACGMM in pb_bss/distribution/cacgmm.py), the method predict doesn't have this parameter. Simply changing predict to _predictdoesn't help.

Thanks a lot if you can answer when you are free

boeddeker commented 2 years ago

Thank you for reporting this.

My local version of pb_bss has some changes that are not published. I forgot to publish this, because for testing the code, I always used iterations_post == 1, hence that code was never executed in the test.

The difference between predict and _predict is, that the first one is comfortable to use and equal between all distributions. The second is for internal usage and does no overhead computations (e.g. normalizing the input and transpose it) and returns model distribution specific states.

I will fix this in https://github.com/fgnt/pb_bss/pull/34 . In the meantime, you could change iterations_post from 0 to 1 (i.e. remove the source activity constraint in the final a posteriori /affiliation estimation). I forgot, how the performance changed by this parameter, but at least it had no negative effect.

elissopp commented 2 years ago

Thanks a lot for your rapid reply and clear answer. I will change as you said.

Besides, I have a small question about the GSS code. Known from your paper, GSS can avoid the permutation problem by utilizing oracle time annotations. While separated cACGMM needs extra permutation alignment. But comparing the class GSS and CACGMM, GSS directly use the method predict without time annotation. So is permutation problem solved in the procedure fit with initialization? In fact, I'm not very familiar with the code of mixture models QAQ.

Thank you again for answering my question

elissopp commented 2 years ago

I mean that without given source_activity_mask in method predict, like

affiliation = cur.predict(
                   Obs.T[f, ...]
               )

In this way, is the permutation problem still exist?

boeddeker commented 2 years ago

There are two ideas to produce most likely a permutation free solution:

I said most likely, because the activity pattern between the speakers and the always active noise must be sufficient different and the speakers must have different spatial properties (e.g. large enough angle between the speakers from the array perspective). Note: I don't know how to define or measure sufficient different. I know, that they have to be different.

While it could be enough to start with a permutation free initialization, I observed that the EM-Algorithm sometimes has issues to keep it permutation free. Maybe it is caused by a too similar activity pattern, or they are spatially too similar.

Once the model is converged, it is unlikely that the permutation will change, so the constraint is no longer nessesary. If you set the iterations_post to 1, only one E-Step (predict) is executed and if then a permutation happen, you would also get issues with the following beamforming.

elissopp commented 2 years ago

Thank you for your reply. I think I understand a lot now.

Hope everything goes well in your future research.

boeddeker commented 2 years ago

You are welcome.