Feature Alignment for Robust Acoustic Scene Classification across Devices

Background
Install
visualization
Results
References

Background

We evaluate the proposed method on DCASE 2019 Task1b and DCASE 2020 Task1a. Both are used to evaluate ASC algorithms across recording devices. DCASE 2020 contains 15480 samples captured by 9 devices: 14400 samples recorded by 3 real devices (A, B, C) and 1080 samples of 6 simulated devices (S1-S6). Note that the samples of the S4-S6 devices do not appear in the training set. DCASE 2019 contains 16560 segments, including 14400/1080/1080 samples from the devices A/B/C, respectively.

Install

Download the data and Change the path in config.py to your own.

$ cd utilities
$ python features.py

$ cd pytorch
$ python train.py

visualization

If you want to visualize feature maps during feature alignment, set 'feature_maps' in 'config' to True.

If you want to visualize the T-sne plot of the results, set the information to visualize the results in 'config'.

$ cd pytorch
$ python test.py

Due to github limit file upload size, please contact me if you need pre-trained best model.

Results

The AdamW optimizer was used for network training, with 200 epochs on an Nvidia RTX 3090 card. The initial learning rate was set to 0.001 and the batch size was set to 64.

References

D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D.Cubuk, and Q. V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” arXivpreprint arXiv:1904.08779, 2019.

Y. Tang, Y. Wang, Y. Xu, B. Shi, C. Xu, C. Xu, and C. Xu,“Beyond dropout: Feature map distortion to regularize deep neural networks,” arXiv preprint arXiv:2002.11022, 2020.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.

Jingqiao-Zhao / FAASC

readme