how to decode using the hybrid ctc/attention architecture?

MKT-Dataoceanai / CNVSRC2023Baseline

Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)

http://cnceleb.org/competition

Other

21 stars 4 forks source link

how to decode using the hybrid ctc/attention architecture? #1

Open LindgeW opened 11 months ago

LindgeW commented 11 months ago

Hi,

Could you please give the implementation details and explanation about how to decode based on the hybrid ctc/attention architecture? (how to linearly combine the ctc score and attention score to produce the final prediction when decoding?)

sectum1919 commented 11 months ago

You can refer to https://github.com/MKT-Dataoceanai/CNVSRC2023Baseline/blob/91f85db4dcb8f7036d662e1538ce7c8a9c8a7804/espnet/nets/beam_search.py#L295 for details. Where we use weighted_scores += self.weights[k] * scores[k] to linear combine the score of decoder and the score of ctc.

LindgeW commented 11 months ago

Is the one-pass decoding scheme that computes the probability of each partial hypothesis using CTC and attention model?

sectum1919 commented 10 months ago

I don't understand what your specific problem is, can you explain it again?

Is the one-pass decoding scheme that computes the probability of each partial hypothesis using CTC and attention model?

LindgeW commented 10 months ago

Sure,

There are actually two decoding strategies for hybrid CTC/attention approach combining the CTC and attention-based sequence probabilities during inference, as mentioned in this original paper (https://aclanthology.org/P17-1048.pdf). Namely, rescoring and one-pass decoding. If I understand correctly, the code you provide for decoding corresponds to the one-pass decoding scheme, right?