Xiaobin-Rong / gtcrn

The official implementation of GTCRN, an ultra-lite speech enhancement model.
MIT License
219 stars 37 forks source link

GRNN representation rearrangement/group size #31

Closed ercandogu-elevear closed 3 months ago

ercandogu-elevear commented 3 months ago

Hello,

First of all, thank you so much for sharing the code. I was trying to understand the representation rearrangement happening in the grouped RNN block. In the original/reference paper they explain the operation as reshape -> transpose -> reshape operation. In your code this is done as far as understand with setting the hidden states contiguous. I was wondering how this really makes sense/ how does it replace these operations?

Because I tried increasing the group sizes to further reduce the complexity and then we get lower performance and the performance difference is bigger than I would prefer. Is this also why you selected 2 as your group sizes? Or did you experiment with other group sizes also?

Thank you already.

Xiaobin-Rong commented 3 months ago

Hi, the original code of GRNN is implemented with a representation rearrangement operation. However, we found this operation unnecessary in dual-path GRNN (DPGRNN) because there is always a fully connected (FC) layer following a grouped RNN layer, which can competently handle representation rearrangement. Therefore, we omitted an explicit rearrangement operation. I have elaborated on this in the README: image

As for the number of groups, I chose a group size of 2 because I believed that more groups could lead to performance degradation. However, I have not conducted a detailed ablation study on the influence of group size.

I hope my explanation clarifies your confusion.

ercandogu-elevear commented 3 months ago

Yes, your explanation helps a lot, didn't see that part at the readme file. Thank you so much for a quick reply :)