athena-team / athena

an open-source implementation of sequence-to-sequence based speech processing engine
https://athena-team.readthedocs.io
Apache License 2.0
953 stars 196 forks source link

Question for deployment #324

Open iou2much opened 3 years ago

iou2much commented 3 years ago

In my understanding, after export pb file and use CPP demo to transcribe. It doesn't use beam_search_decoder or wfst_decoder, it just output the transformer decoder result straightly. am I right?

If so, could anyone give some guidance for using beam_search or wfst in deployment mode? Thanks a lot

Some-random commented 3 years ago

The logit operation (addition/ assignment/ for loop) for beam_search_decoder and wfst_decoder are written with python. If you want to perform beam search with C++, there are two ways: 1 You need to create pbs that capture the network operations (encoder feature extraction, decoder step with encoder and previous states and inputs) and stitch them with C++ logits operations. 2 You can write the logit operation with tensorflow ops and freeze the whole graph to one pb. I believe the second option has already been implemented in MWER training of Speech Transformer.

iou2much commented 3 years ago

implemented in MWER training of Speech Transformer

Really? That's great. Let me check it out. Thank you

iou2much commented 3 years ago

Hi, @Some-random and @hoyden . I've read the BatchBeamSearchLayer module in branch mwer. Yet I still get some questions, could you help to interpret more here? In BatchBeamSearchLayer, there's no scorer like CTCScorer, lm_model scorer. Do I need them in training stage or decoding stage? won't it help the performance?

Some-random commented 3 years ago

Hi, @Some-random and @hoyden . I've read the BatchBeamSearchLayer module in branch mwer. Yet I still get some questions, could you help to interpret more here? In BatchBeamSearchLayer, there's no scorer like CTCScorer, lm_model scorer. Do I need them in training stage or decoding stage? won't it help the performance?

BatchBeamSearchLayer is used in training stage, CTCScorer and lm_model scorer is not used in this stage. For decoding stage, adding these scorers will obviously boost the performance, but we haven't provided deployment with language model and CTC joint decoding yet