leaf-pytorch
implementation is now officially a part of SpeechBrain, with a sample recipe on SpeechCommands-v2 here. I would recommend folks trying to work with LEAF use SpeechBrain implementation instead, because of the overall ecosystem as well as better documentation. Thanks for your interest!
This work would not be possible without cloud resources provided by Google's TPU Research Cloud (TRC) program. I also thank the TRC support team for quickly resolving whatever issues I had: you're awesome!
This is a PyTorch implementation of the LEAF audio frontend [1], made using the official tensorflow implementation as a direct reference.
This implementation supports training on TPUs using torch-xla
.
torch-xla
has some issues with certain complex64
operations: torch.view_as_real(comp)
, comp.real
, comp.imag
as highlighted in #Issue 3070.
These are used primarily for generating gabor impulse responses. To bypass this shortcoming, an alternate implementation using manual complex number operations is provided.torch >= 1.9.0
torchaudio >= 0.9.0
torch-audiomentations==0.9.0
SoundFile==0.10.3.post1
msgpack
msgpack-numpy
wandb
transformers
lmdb
[Optional] torch_xla == 1.9
Additional dependencies include
## needed for augmentations
[WavAugment](https://github.com/facebookresearch/WavAugment)
To train a model on speechcommands, run the following:
python train.py --cfg_file cfgs/speechcommands/efficientnet-b0-leaf-default.cfg --expdir ./exps/scv2/efficientnet-b0_default_leaf_bs1x256_adam_warmupcosine_wd_1e-4_rs8881 --epochs 100 --num_workers 8 --log_steps 50 --random_seed 8881 --no_wandb
To evaluate the trained model, do
python test.py --test_csv_name ./speechcommands_v2_meta/test.csv --exp_dir ./exps/scv2/efficientnet-b0_default_leaf_bs1x256_adam_warmupcosine_wd_1e-4_rs8881 --meta_dir ./speechcommands_v2_meta
All experiments on VoxCeleb1 and SpeechCommands were repeated at least 5 times, and 95%
ci are reported.
Model | Dataset | Metric | features | Official | This repo | weights |
---|---|---|---|---|---|---|
EfficientNet-b0 | SpeechCommands v2 | Accuracy | LEAF | 93.4±0.3 | 94.5±0.3 | ckpt |
ResNet-18 | SpeechCommands v2 | Accuracy | LEAF | N/A | 94.05±0.3 | ckpt |
EfficientNet-b0 | VoxCeleb1 | Accuracy | LEAF | 33.1±0.7 | 40.9±1.8 | ckpt |
ResNet-18 | VoxCeleb1 | Accuracy | LEAF | N/A | 44.7±2.9 | ckpt |
complex_conv
initTo evaluate how non-Mel
initialization schemes for complex_conv
work, experiments were repeated on xavier_normal
, kaiming_normal
and randn
init schemes on the SpeechCommands dataset.
Model | Features | Init | Test Accuracy |
---|---|---|---|
EfficientNet-b0 | LEAF | Default (Mel) |
94.5±0.3 |
EfficientNet-b0 | LEAF | randn |
84.7±1.6 |
EfficientNet-b0 | LEAF | kaiming_normal |
84.7±2.3 |
EfficientNet-b0 | LEAF | xavier_normal |
79.1±0.7 |
import os
import torch
import pickle
from models.classifier import Classifier
results_dir = "
frontend = model.features
# References
[1] If you use this repository, kindly cite the LEAF paper:
@article{zeghidour2021leaf, title={LEAF: A Learnable Frontend for Audio Classification}, author={Zeghidour, Neil and Teboul, Olivier and de Chaumont Quitry, F{\'e}lix and Tagliasacchi, Marco}, journal={ICLR}, year={2021} }
Please also consider citing this implementation using the following bibtex or from the citation widget on the sidebar.
@software{Yadav_leaf-pytorch_2021, author = {Yadav, Sarthak}, month = {12}, title = {{leaf-pytorch}}, version = {0.0.1}, year = {2021} }