fgnt / pb_sed

Paderborn Sound Event Detection
MIT License
68 stars 10 forks source link

Single example evaluation #8

Closed arshinmar closed 2 years ago

arshinmar commented 3 years ago

Hello,

I'd like to begin by saying thank you for creating the repo. As a top paper at DCase 2020, it is very valuable for many many detection and recognition applications.

I was wondering if there was any code written for single example evaluation. For example, are there any functions for detection of events in a single audio clip?

This would be very helpful to validate results in a real-world setting with different audio clips.

Currently, it seems like the "get_dataset" function in data.py is primarily suited for DESED and not for singular clips

If any pointers could be provided as to how to best create such a functionality, it would be much appreciated.

JanekEbb commented 3 years ago

Hi, thanks for your interest in our model and sorry for the late response. And yes you are right, unfortunately it is not straightforward to apply the model to some other audio.

If you want to apply a trained model to some other data, the steps that need to be done are the following:

  1. normalize the waveform (currently happens in get_dataset lines 199-205)
  2. compute stft (currently happens in prepare_dataset lines 267-268)
  3. put together a batch dictionary by adding the key "stft" holding the stft of shape batch x 1 x timesteps x frequencies x 2 (last dim is real/imaginary part) and a key "seq_len" which holds a list of the original numbers of time steps for the individual examples in the batch.
  4. perform tagging by calling CRNN (currently lines 141-142 of run_inference.py) and thresholding (currently line 163 of run_inference.py)
  5. call cnn for event detection (currently lines 290-298 of run_inference.py), run median filtering (currently lines 207-209 of run_inference.py) and thresholding (currently line 215 of run_inference.py)

However, note that I have a few updates in the pipeline which may change some of the interfaces and signatures (and above line numbers). If you let me know which exact functionality you would like to have, I would be happy to provide a script or function allowing to apply the models to new data.

arshinmar commented 3 years ago

Hello,

Thanks for the response, and also for explaining the different steps that go into using the model on data.

I was interested in the event detection functionality with the use of the CNN. While I am working on my own script where I try to implement the steps above, it would be much appreciated if you could provide a script that can apply the models to new data (such as a specific .wav file or a set of .wav files).