Closed sarapapi closed 2 years ago
Yes.
wait-k baseline model can be trained with arch "online_audio_transformer_waitk" (in rain/models/waitk_transformer.py), where the args (main_context, right_context) are for our encoder, 'decoder_blocks_per_token' is to determine predecision step (2 for 320ms with main_context 16), 'decoder_delay_blocks' as k*decoder_blocks_per_token.
For inference, you may run simuleval toolkit with agent rain/simul/speech_waitk.py, as in scripts/seval_waitk3.py. args.step_forecast is for SBS, and set it to 0 for baseline wait-k. args.beam is also needed for beam search while source finished.
Hi all, is it possible to replicate the baseline in your paper through this repository? In particular, I am referring to the speech part using wait-k with and without speculative beam search. As far as I understood the data processing, KD, etc. is the same, so I suppose that only the train and inference scripts are different. Thank you