conformer transducer timestamp extraction

Khimer commented 1 year ago

Good day! Thanks for your hard work! I am trying to extract timestamps for a model stt_en_conformer_transducer_large Following https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_Transducers.ipynb#scrollTo=xkv_x8NAfpX3 But I'm having difficulty converting timestamps: "Note that each timestep here is (roughly) timestep∗total_stride_of_model∗preprocessor.window_stride seconds timestamp" - I can extract "timestep" from hypotheses, but model.preprocessor doesn't have a window_stride field and I'm not sure which value to use here. Also, I couldn't figure out where the "total_stride_of_model" value comes from. Do I understand correctly, total_stride_of_model == len(audio) / 'window_stride'? By the way, the 'window_stride' field is also missing from model.preprocessor. Thank you!

titu1994 commented 1 year ago

Model.cfg.preprocessor has those fields

titu1994 commented 1 year ago

Total stride is inherently part of model. There's no place in config that mentions it. Conformer and Squeezeformer has 4x stride, Citrinet has 8x stride, QuartzNet and Jasper have 2x stride.

Khimer commented 1 year ago

This helped, thanks a lot!

NVIDIA / NeMo

conformer transducer timestamp extraction #5896