Add NAS KWS model, Dynamic Augmentation and Automated Evaluation Notebook

Major changes:

Dynamic Augmentation is introduced
- Instead of fixing the augmented examples during dataset creation, the dataset loader now generates unique training examples during each epoch, significantly boosting robustness against noise and time shifts.
- The more costly "speed augmentation" remains fixed, carried out once during dataset creation.
- For stability of validation results across epochs, the validation examples (original + augmentations) are also fixed; they are constructed during initial dataset creation.
Changed the dataset filename (dataset2.pt->dataset3.pt) to avoid potential mix-ups, as this PR introduces a major change
- Added "shift_limits" property to each sample (for possible future feature compatibility, regarding voice activity detection)
- The generated dataset contains the following:
  - The original training samples from Google Speech Commands, and 2 augmented versions of each sample with different speeds.
  - Additional training samples from Librispeech as additional examples for the "background" class.
  - The original validation samples from Google Speech Commands, and 2 augmented versions of each sample with different speeds, time shifts, and added white noise.
  - The original test samples from Google Speech Commands without any augmentation.
Dataset creation is significantly faster (90 mins -> 4 mins), thanks to more efficient operations done in batches.
The network found via "Neural Architecture Search" is introduced, which significantly improves accuracy than its predecessors (v2 & v3), having a higher parameter count, slightly increased #MACs, and latency (3.2ms -> 3.9ms).
From: @EyubogluMerve: Added automated evaluation notebook for specified noise types and SNR levels..
- Added a new dataset (signalmixer.py)
- Modified msnoise.py to:
  - include "Tradeshow" as another type of noise
  - carry out proper train/test splits

Summary of Improvements:

Along with the previous PR, we have improved the KWS20 accuracy from ~86.5% to 92.5% on the validation set which includes augmented samples, and from 87.6% to 93.7% on the clean test set.

The impact of each change on the KWS20 accuracy are as follows:

pytsmod tempo augmentation -> torchaudio speed augmentation: +1%
v3 -> v2 model: +1.5%
v2 -> NAS model: +2.5%
Dynamic noise & shift augmentation: +1%
Total: +6% Absolute change in accuracy, from 86.5%->92.5%
- 44% decrease in error rates, with even more significant reduction in false alarm rates.

analogdevicesinc / ai8x-training

Add NAS KWS model, Dynamic Augmentation and Automated Evaluation Notebook #280

Major changes:

Summary of Improvements: