SarthakYadav / leaf-pytorch

PyTorch implementation of the LEAF audio frontend
68 stars 9 forks source link

leaf-pytorch

Attention

leaf-pytorch implementation is now officially a part of SpeechBrain, with a sample recipe on SpeechCommands-v2 here. I would recommend folks trying to work with LEAF use SpeechBrain implementation instead, because of the overall ecosystem as well as better documentation. Thanks for your interest!

Sponsors

This work would not be possible without cloud resources provided by Google's TPU Research Cloud (TRC) program. I also thank the TRC support team for quickly resolving whatever issues I had: you're awesome!

About

This is a PyTorch implementation of the LEAF audio frontend [1], made using the official tensorflow implementation as a direct reference.
This implementation supports training on TPUs using torch-xla.

Key Points

Dependencies

torch >= 1.9.0
torchaudio >= 0.9.0
torch-audiomentations==0.9.0
SoundFile==0.10.3.post1
msgpack
msgpack-numpy
wandb
transformers
lmdb
[Optional] torch_xla == 1.9

Additional dependencies include

## needed for augmentations
[WavAugment](https://github.com/facebookresearch/WavAugment)

Running experiments

Setup

Training

To train a model on speechcommands, run the following:

python train.py --cfg_file cfgs/speechcommands/efficientnet-b0-leaf-default.cfg --expdir ./exps/scv2/efficientnet-b0_default_leaf_bs1x256_adam_warmupcosine_wd_1e-4_rs8881 --epochs 100 --num_workers 8 --log_steps 50 --random_seed 8881 --no_wandb

Testing

To evaluate the trained model, do

python test.py --test_csv_name ./speechcommands_v2_meta/test.csv --exp_dir ./exps/scv2/efficientnet-b0_default_leaf_bs1x256_adam_warmupcosine_wd_1e-4_rs8881 --meta_dir ./speechcommands_v2_meta

Results

All experiments on VoxCeleb1 and SpeechCommands were repeated at least 5 times, and 95% ci are reported.

Model Dataset Metric features Official This repo weights
EfficientNet-b0 SpeechCommands v2 Accuracy LEAF 93.4±0.3 94.5±0.3 ckpt
ResNet-18 SpeechCommands v2 Accuracy LEAF N/A 94.05±0.3 ckpt
EfficientNet-b0 VoxCeleb1 Accuracy LEAF 33.1±0.7 40.9±1.8 ckpt
ResNet-18 VoxCeleb1 Accuracy LEAF N/A 44.7±2.9 ckpt

Observations

Evaluating different init schemes for complex_conv init

To evaluate how non-Mel initialization schemes for complex_conv work, experiments were repeated on xavier_normal, kaiming_normal and randn init schemes on the SpeechCommands dataset.

Model Features Init Test Accuracy
EfficientNet-b0 LEAF Default (Mel) 94.5±0.3
EfficientNet-b0 LEAF randn 84.7±1.6
EfficientNet-b0 LEAF kaiming_normal 84.7±2.3
EfficientNet-b0 LEAF xavier_normal 79.1±0.7

Loading Pretrained Models

results_dir = "" hparams_path = os.path.join(results_dir, "hparams.pickle") ckpt_path = os.path.join(results_dir, "ckpts", "") checkpoint = torch.load(ckpt_path) with open(hparams_path, "rb") as fp: hparams = pickle.load(fp) model = Classifier(hparams.cfg) print(model.load_state_dict(checkpoint['model_state_dict']))

to access just the pretrained LEAF frontend

frontend = model.features


# References

[1] If you use this repository, kindly cite the LEAF paper:

@article{zeghidour2021leaf, title={LEAF: A Learnable Frontend for Audio Classification}, author={Zeghidour, Neil and Teboul, Olivier and de Chaumont Quitry, F{\'e}lix and Tagliasacchi, Marco}, journal={ICLR}, year={2021} }


Please also consider citing this implementation using the following bibtex or from the citation widget on the sidebar.

@software{Yadav_leaf-pytorch_2021, author = {Yadav, Sarthak}, month = {12}, title = {{leaf-pytorch}}, version = {0.0.1}, year = {2021} }