Speech command classification on Speech-Command v0.02 dataset using PyTorch and torchaudio. In this example, three models have been trained using the raw signal waveforms, MFCC features and MelSpectogram features.
The labels_dict is considering some folders which are not really useful. By considering all the folders in the Speech_Command_V0.02 folder like the following:
_backgroundnoise
testing_list.txt
validation_list.txt.....
The labels index are not just 0-34 but from the range 0-40.
Yes. You are correct. This code does have that issue. When I wrote it I actually removed those files manually at the beginning but never mentioned it in the repository.
The labels_dict is considering some folders which are not really useful. By considering all the folders in the Speech_Command_V0.02 folder like the following:
The labels index are not just 0-34 but from the range 0-40.
From:
labels_dict=os.listdir(train_audio_path)
output: labels_dict = ['tree', 'cat', 'go', 'left', 'yes', '.DS_Store', 'sheila', 'learn', 'stop', 'backward', 'seven', 'follow', 'zero', 'three', 'down', 'no', 'up', 'six', 'four', 'nine', 'LICENSE', 'happy', 'validation_list.txt', '_backgroundnoise', 'wow', 'visual', 'house', 'README.md', 'off', 'five', 'dog', 'one', 'eight', 'testing_list.txt', 'on', 'two', 'marvin', 'bird', 'forward', 'right', 'bed']
To:
labels_dict=list(set(labels)) labels_dict = ['tree', 'cat', 'go', 'left', 'yes', 'sheila', 'learn', 'stop', 'backward', 'seven', 'follow', 'zero', 'three', 'down', 'no', 'up', 'six', 'four', 'nine', 'happy', 'wow', 'visual', 'house', 'off', 'five', 'dog', 'one', 'eight', 'on', 'two', 'marvin', 'bird', 'forward', 'right', 'bed']