Open Sameep-c opened 8 months ago
Of course we can! A challenging part would be to properly align the tokens from the language model and from Seamless. I am not sure there is code that you can apply out of the box for this, but it is certainly a solvable task.
But I think that LM rescoring with Seamless doesn't make as much sense as with CTC-based ASR models, because the Seamless text decoder is already an autoregressive transformer language model on its own.
Can we use an external LM rescoring model such as KenLM for the text decoder part of Seamless M4T for tasks such as ASR or S2T translation?