SamsungLabs / SummaryMixing

This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
Other
86 stars 8 forks source link

Issue Encoder-only SummaryMixing #5

Closed Adel-Moumen closed 2 months ago

Adel-Moumen commented 3 months ago

Hi,

Thanks for this repo. I have been playing a bit with your creation SummaryMixing and tried to plug it in a CTC-only recipe (https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/ASR/CTC/train.py) which has been implemented by Shucong. However, by using this recipe one must make sure that num_decoder_layers is equal to 0. Doing so will create an issue with SummaryMixing because of this line: https://github.com/SamsungLabs/SummaryMixing/blob/main/speechbrain/lobes/models/transformer/TransformerASR.py#L388 . Indeed, the forward tgt = self.custom_tgt_module(tgt) is trying to use the custom_tgt_module which is defined here: https://github.com/SamsungLabs/SummaryMixing/blob/main/speechbrain/lobes/models/transformer/TransformerASR.py#L334-L335 but as you can see this method is only defined if num_decoder_layers > 0. One fix is to simply return from this function right after the Encoder as we did in speechbrain here: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/lobes/models/transformer/TransformerASR.py#L350-L352

Also, I found that it was a bit hard to use SummaryMixing because of this line: https://github.com/SamsungLabs/SummaryMixing/blob/main/speechbrain/lobes/models/transformer/Branchformer.py#L222 . Indeed, you are defined the VanillaDNN using local_proj_out_dim + summary_out_dim, but in my experiment with the CTC-only recipe it is "impossible" to define externally the input dimension because it seems to have some extra downsampling somewhere. I had to do some harcoding in order to make it work. I may be only related to the CTC-only recipe, but I wanted to let you know as maybe you already experienced this issue.

Thanks!

Adel

TParcollet commented 3 months ago

@shucongzhang do you remember making a change for your CTC only recipe? The change looks good to me though.

shucongzhang commented 2 months ago

@Adel-Moumen Hi Adel, thank you very much for your interests in this. Due to whatever reason, I do not get any notification for threads in this repo :( , and I just noticed your post now. I'll look at this by the weekend and go back to you.

shucongzhang commented 2 months ago

@Adel-Moumen Hi Adel, for the CTC-only encoder, yes, num_decoder_layers > 0 is an issue and I think I've done similar things as you did to make it work. However, for local_proj_out_dim + summary_out_dim I didn't remember this causes problems for CTC-only training. If you are still interested in this, could you share where you do the hard coding? Thank you.