Open HeChengHui opened 4 months ago
Yes but it is trained on 10 seconds chunks and the model is not causal. You would need to use 10 seconds windows and advance by a certain stride each time. Latency of the system will depends on the post processing you will do. E.g. if you do overlap add it will still be 10 seconds, if you only take for granted the prediction on the new stride region then it is equal to the stride.
@popcornell i see. is there any code to reference to build this pipeline?
Not really, but in https://github.com/DCASE-REPO/DESED_task/blob/c6bcb45b8b986ccde5c56bb86eefaf9d19b2320c/recipes/dcase2024_task4_baseline/local/sed_trainer_pretrained.py#L1457 we reconstruct long-form predictions from windowed predictions
@popcornell thank you. can i check if SED is the correct task to look into for online detection of audio cues in a noisy environment?
What do you mean by audio cues ?
Like alarms
Then yeah
If i want to train my own model based on the 2024 task, looks like i can use the pretrained baseline and pre-compute embeddings of my dataset as base. Then if i want to inference on a video clip or 10s audio, am i supposed to also use this? : https://github.com/DCASE-REPO/DESED_task/blob/c6bcb45b8b986ccde5c56bb86eefaf9d19b2320c/recipes/dcase2024_task4_baseline/local/sed_trainer_pretrained.py#L1457
is it possible to run sed baseline in causal mode? i would like to use it on an audio stream to detect certain audio cues in a noisy environment.