Instead of fixing the augmented examples during dataset creation, the dataset loader now generates unique training examples during each epoch, significantly boosting robustness against noise and time shifts.
The more costly "speed augmentation" remains fixed, carried out once during dataset creation.
For stability of validation results across epochs, the validation examples (original + augmentations) are also fixed; they are constructed during initial dataset creation.
Changed the dataset filename (dataset2.pt->dataset3.pt) to avoid potential mix-ups, as this PR introduces a major change
Added "shift_limits" property to each sample (for possible future feature compatibility, regarding voice activity detection)
The generated dataset contains the following:
The original training samples from Google Speech Commands, and 2 augmented versions of each sample with different speeds.
Additional training samples from Librispeech as additional examples for the "background" class.
The original validation samples from Google Speech Commands, and 2 augmented versions of each sample with different speeds, time shifts, and added white noise.
The original test samples from Google Speech Commands without any augmentation.
Dataset creation is significantly faster (90 mins -> 4 mins), thanks to more efficient operations done in batches.
The network found via "Neural Architecture Search" is introduced, which significantly improves accuracy than its predecessors (v2 & v3), having a higher parameter count, slightly increased #MACs, and latency (3.2ms -> 3.9ms).
From: @EyubogluMerve: Added automated evaluation notebook for specified noise types and SNR levels..
Added a new dataset (signalmixer.py)
Modified msnoise.py to:
include "Tradeshow" as another type of noise
carry out proper train/test splits
Summary of Improvements:
Along with the previous PR, we have improved the KWS20 accuracy from ~86.5% to 92.5% on the validation set which includes augmented samples, and from 87.6% to 93.7% on the clean test set.
The impact of each change on the KWS20 accuracy are as follows:
pytsmod tempo augmentation -> torchaudio speed augmentation: +1%
v3 -> v2 model: +1.5%
v2 -> NAS model: +2.5%
Dynamic noise & shift augmentation: +1%
Total: +6% Absolute change in accuracy, from 86.5%->92.5%
44% decrease in error rates, with even more significant reduction in false alarm rates.
Major changes:
Summary of Improvements:
Along with the previous PR, we have improved the KWS20 accuracy from ~86.5% to 92.5% on the validation set which includes augmented samples, and from 87.6% to 93.7% on the clean test set.
The impact of each change on the KWS20 accuracy are as follows: