Closed GT-KIM closed 1 year ago
Conventional pipeline term in spoken language understanding indicates the system first generates transcription using ASR and then follows the NLP model to generate the final output. Thus, each module is trained with different objectives. An end-to-end system means you have a single model that is trained with a single objective (e.g. NER in this case). As long as you train the model in an end-to-end manner (audio or audio feature to label directly), we can regard it as an end-to-end model. Pre-training+fine-tuning can be regarded as an end-to-end system.
Submission link will be updated soon in this page. https://asappresearch.github.io/slue-toolkit/interspeech2023_submission.html
I understand. Thank you!
Hello, I have two questions about the challenge rule.
What is the difference between pipeline and end2end? In my understanding, the "pipeline" model has two neural network models independently: audio-to-text (ASR) and text-to-entity (NLP). The "E2E" model has one neural network model: audio-to-(entity, word). If my model can estimate the whole transcript and entity of each word in only one neural network, is it "E2E"? Or, if my model has multiple training steps(pretraining-finetuning-end2end) but has one inference step(end2end), it can be "E2E"? I think if there is no "token id to string" stage in the inference step, it can be "E2E", but I'm not sure.
Where can I submit my submissions for participating challenge? Moreover, can we see the leaderboard or test results before the deadline?
Thanks