Closed rookie0607 closed 5 months ago
The latency evaluation should be conducted for the entire system considering all the components. That is, if the front-end separation uses any future information, this needs to be reflected in the word time-stamps and the evaluated latency. However, for evaluation, we will divide the systems into four categories based on their average latency with thresholds of 1000ms, 350ms, 150ms. A non-streaming system can still be submitted and will fall into the category of latency > 1000ms. This is currently not completely clear from the formulation of the rules at the website and we will update them to emphasise this better.
If the CSS module of your system uses the entire recording to generate outputs, please set the word-timestamps, which are used to compute the latency, to the length of the entire recording. Furthermore, if the ASR module of your system is streaming, you may also compute the latency of this module separately and report it in the system description. We are definitely interested in such submission too. But to keep the rules fair, we will rank the system based on the overall latency.
Closing the issue. Feel free to re-open or make a new issue if you have any further questions.
Acknowledgement of the rules of the competition includes the clause "The submitted system must be streaming, i.e. process the input in chronological order and specify a delay time for each word sent, as detailed in the subsection "Evaluation". The system must not use any global information from the recording until it has been processed chronologically. Such global information may include global normalization, non-streaming speaker identification, or journaling. This requirement regarding streaming processing applies to all modalities (audio, visual, accelerometer, gyroscope, etc.)." Do front-end processing modules such as continuous speech separation have to be streaming? Is it possible to do the separation uniformly with an offline css model first and then feed it into a streaming asr model for recognition?