Open elissopp opened 2 years ago
Thank you for reporting this.
My local version of pb_bss
has some changes that are not published.
I forgot to publish this, because for testing the code, I always used iterations_post == 1
, hence that code was never executed in the test.
The difference between predict
and _predict
is, that the first one is comfortable to use and equal between all distributions.
The second is for internal usage and does no overhead computations (e.g. normalizing the input and transpose it) and returns model distribution specific states.
I will fix this in https://github.com/fgnt/pb_bss/pull/34 .
In the meantime, you could change iterations_post
from 0
to 1
(i.e. remove the source activity constraint in the final a posteriori /affiliation estimation). I forgot, how the performance changed by this parameter, but at least it had no negative effect.
Thanks a lot for your rapid reply and clear answer. I will change as you said.
Besides, I have a small question about the GSS code. Known from your paper, GSS can avoid the permutation problem by utilizing oracle time annotations. While separated cACGMM needs extra permutation alignment. But comparing the class GSS
and CACGMM
, GSS
directly use the method predict
without time annotation. So is permutation problem solved in the procedure fit
with initialization
? In fact, I'm not very familiar with the code of mixture models QAQ.
Thank you again for answering my question
I mean that without given source_activity_mask
in method predict
, like
affiliation = cur.predict(
Obs.T[f, ...]
)
In this way, is the permutation problem still exist?
There are two ideas to produce most likely a permutation free solution:
I said most likely, because the activity pattern between the speakers and the always active noise must be sufficient different and the speakers must have different spatial properties (e.g. large enough angle between the speakers from the array perspective). Note: I don't know how to define or measure sufficient different. I know, that they have to be different.
While it could be enough to start with a permutation free initialization, I observed that the EM-Algorithm sometimes has issues to keep it permutation free. Maybe it is caused by a too similar activity pattern, or they are spatially too similar.
Once the model is converged, it is unlikely that the permutation will change, so the constraint is no longer nessesary.
If you set the iterations_post
to 1
, only one E-Step (predict
) is executed and if then a permutation happen, you would also get issues with the following beamforming.
Thank you for your reply. I think I understand a lot now.
Hope everything goes well in your future research.
You are welcome.
In line 198 of file core_chime6.py, method predict is used with a parameter
source_activity_mask
But in the definition of the object cur (as well as class CACGMM in pb_bss/distribution/cacgmm.py), the method
predict
doesn't have this parameter. Simply changingpredict
to_predict
doesn't help.Thanks a lot if you can answer when you are free