aromanusc / SoundQ

Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)
3 stars 1 forks source link

Change the number of channels? #22

Closed zoey9628 closed 5 months ago

zoey9628 commented 6 months ago

I have successfully generated audio and video data using the audio-visual synthetic data generator, in which the audio data is 32 channels, but the audio used in the DCASE competition is 4 channels. May I ask how to convert the channels? Looking forward to your reply very much.

adrianSRoman commented 6 months ago

This is a great question. Tetrahedral microphone channels (4 channels) from an eigenmike (32 channels) correspond to indexes: [5, 9, 25, 21]

zoey9628 commented 6 months ago

This is a great question. Microphone channels (4 channels) from an eigenmike (32 channels) correspond to indexes: [5, 9, 25, 21]

Thank you for your reply. I'll try it : )

adrianSRoman commented 6 months ago

Awesome. Btw, SpatialScaper is a great tool to synthesize audio too. It supports the METU RIRs plus other real rooms. Perhaps you could integrate the audiovisual data with that library. Just letting you know if you need more variety in RIRs. Good luck!

zoey9628 commented 6 months ago

Awesome. Btw, SpatialScaper is a great tool to synthesize audio too. It supports the METU RIRs plus other real rooms. Perhaps you could integrate the audiovisual data with that library. Just letting you know if you need more variety in RIRs. Good luck!

Thank you for your suggestion. I will try SpatialScaper later. Another question, How do you get the correspond indexes: [5, 9, 25, 21] ? I didn't see that from the dataset description.

adrianSRoman commented 5 months ago

Oh, sorry for the late response.

In here https://github.com/aromanusc/SoundQ/blob/ffeec8c4249967f7daecc4250ca838caca06412f/synth_data_gen/audiovisual_synth.py#L207

change em32 for mic. That will automatically generate Spatial Audio for a 4ch mic array (mic). No need to index yourself like I mentioned before. That should pick the correct mic format you need

zoey9628 commented 5 months ago

Oh, sorry for the late response.

In here

https://github.com/aromanusc/SoundQ/blob/ffeec8c4249967f7daecc4250ca838caca06412f/synth_data_gen/audiovisual_synth.py#L207

change em32 for mic. That will automatically generate Spatial Audio for a 4ch mic array (mic). No need to index yourself like I mentioned before. That should pick the correct mic format you need

If I change em32 for mic, 'IR_mic.wav' should be used. But I don't have the file 'IR_mic.wav'. How to generate 'IR_mic.wav'? https://github.com/aromanusc/SoundQ/blob/ffeec8c4249967f7daecc4250ca838caca06412f/synth_data_gen/audio_spatializer.py#L34

aromanusc commented 5 months ago

I just updated the remix script for the METU dataset. Please take a look: synth_data_gen/remix_metu_rirs.py

zoey9628 commented 5 months ago

I just updated the remix script for the METU dataset. Please take a look: synth_data_gen/remix_metu_rirs.py

Thanks for your update. But I also have the question that why you pick [6, 10, 26, 22] as channels id ?

https://github.com/aromanusc/SoundQ/blob/main/synth_data_gen/remix_metu_rirs.py#L16

aromanusc commented 5 months ago

Those are the 4 channels in an eigenmike that correspond to the channels in a tetrahedreal mic array:

Em32: https://mhacoustics.com/sites/default/files/EigenmikeReleaseNotesV15.pdf

Mic: https://www.core-sound.com/products/tetramic