User/asroman/third party submodules

aromanusc / SoundQ

Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)

3 stars 1 forks source link

User/asroman/third party submodules #7

Closed aromanusc closed 11 months ago

aromanusc commented 11 months ago

Bring in SELDnet23 audio-visual model
Adds bounding box extractor script using https://github.com/facebookresearch/Detic/tree/main
- To use bbox_extract.py we need to copy the file into the Detic/ directory.

aromanusc commented 11 months ago

@baladithyab - this adds a bounding box extraction script using Detic. For the detection models you are bringing up I think we should follow a similar design pattern and stick with a extraction script.

The benefit of using a bbox extraction script for all available .mp4 is mainly for training efficiency. The original SELDnet23 model runs inference with YOLOX on the fly, but that is not scalable for the models and amount of data we are about to start using.