DCASE-REPO / DESED_task

Domestic environment sound event detection task
MIT License
131 stars 67 forks source link

Test baseline on audio stream #107

Open HeChengHui opened 4 months ago

HeChengHui commented 4 months ago

is it possible to run sed baseline in causal mode? i would like to use it on an audio stream to detect certain audio cues in a noisy environment.

popcornell commented 4 months ago

Yes but it is trained on 10 seconds chunks and the model is not causal. You would need to use 10 seconds windows and advance by a certain stride each time. Latency of the system will depends on the post processing you will do. E.g. if you do overlap add it will still be 10 seconds, if you only take for granted the prediction on the new stride region then it is equal to the stride.

HeChengHui commented 4 months ago

@popcornell i see. is there any code to reference to build this pipeline?

popcornell commented 4 months ago

Not really, but in https://github.com/DCASE-REPO/DESED_task/blob/c6bcb45b8b986ccde5c56bb86eefaf9d19b2320c/recipes/dcase2024_task4_baseline/local/sed_trainer_pretrained.py#L1457 we reconstruct long-form predictions from windowed predictions

HeChengHui commented 4 months ago

@popcornell thank you. can i check if SED is the correct task to look into for online detection of audio cues in a noisy environment?

popcornell commented 4 months ago

What do you mean by audio cues ?

HeChengHui commented 4 months ago

Like alarms

popcornell commented 4 months ago

Then yeah

HeChengHui commented 4 months ago

If i want to train my own model based on the 2024 task, looks like i can use the pretrained baseline and pre-compute embeddings of my dataset as base. Then if i want to inference on a video clip or 10s audio, am i supposed to also use this? : https://github.com/DCASE-REPO/DESED_task/blob/c6bcb45b8b986ccde5c56bb86eefaf9d19b2320c/recipes/dcase2024_task4_baseline/local/sed_trainer_pretrained.py#L1457