Code repository for the paper Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs [1].
If you're only looking for our Pytorch implementation of the Icosahedral CNNs, you can find it here.
path_train
and path_test
variables.path_locata
variable.You can use the script 1sourceTracking_icoCNN.py
to train the model and test it with synthetic and real recordings. You can change the resolution of the inputs maps by
changing the value or r
in line 22.
The script is organized in cells, you can skip the training cell
and just load the pretrained models.
You can find the definition of the model in acousticTrackingModels.py and the implementation of our sof-argmax function in acousticTrackingModules.py. If you are looking for the implementation of the icosahedral convolutions, they have their own repository. The baseline model Cross3D [3] also has his own repository with his code and the pretrained models.
The pretrained models and the test results can be found in the models and results folders.
acousticTrackingDataset.py
, acousticTrackingLearners.py
, acousticTrackingModels.py
and acousticTrackingDataset.py
contain several classes and functions employed by the main script. They are updated versions of the onew found in the
repository of Cross3D and have been published to facilitate the replicability
of the research presented in [1], not as a software library. Therefore, any feature included in them that
is not used by the main script may be untested and could contain bugs.
[1] D. Diaz-Guerra, A. Miguel, J.R. Beltran, "Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs," [arXiv preprint].
[2] D. Diaz-Guerra, A. Miguel, J.R. Beltran, "gpuRIR: A python library for Room Impulse Response simulation with GPU acceleration," in Multimedia Tools and Applications, Oct. 2020 [DOI] [SharedIt] [arXiv preprint].
[3] D. Diaz-Guerra, A. Miguel and J. R. Beltran, "Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 300-311, 2021 [DOI] [arXiv preprint].