NXTProduct / TUNet

52 stars 16 forks source link

TUNet - Official Implementation

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining - ICASSP 2022

Generic badge PWC Generic badge Generic badge

License and citation

This code is available for academic research only. If you use our software, please cite as below. For commercial applications, please contact nxt.sales@fsoft.com.vn.

Copyright © 2021 FPT Software, Inc. All rights reserved.

@inproceedings{Nguyen_2022,
    doi = {10.1109/icassp43922.2022.9747699},
    url = {https://doi.org/10.1109%2Ficassp43922.2022.9747699},
    year = 2022,
    month = {may},
    publisher = {{IEEE}},
    author = {Viet-Anh Nguyen and Anh H. T. Nguyen and Andy W. H. Khong},
    title = {{TUNet}: A Block-Online Bandwidth Extension Model Based On Transformers And Self-Supervised Pretraining},
    booktitle = {{ICASSP} 2022 - 2022 {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP})}
}

1. Results

Our model achieved a significant gain over baselines. Here, we include the predicted mean-opion-score (MOS) using Microsoft's DNSMOS Azure service. Please refer to our paper for more benchmarks.

Model DNSMOS
Input 3.0951
TFiLM-UNet 3.1026
WSRGlow 3.2053
NU-Wave 3.2760
TUNet 3.3896

We also provide several audio samples in audio_samples for comparison. In spectrogram visualization, it can be seen that high frequencies generated by our models are more accurate than the baselines.

2. Installation

Setup

Clone the repo

$ git clone https://github.com/NXTProduct/TUNet.git
$ cd TUNet

Install dependencies

Note: the argument -f https://download.pytorch.org/whl/cu113/torch_stable.html is provided to install torch==1.10.0+cu113 (Pytorch 1.10, CUDA 11.3) inside the requirements.txt . Choose an appropriate CUDA version to your GPUs and change/remove the argument according to PyTorch documentation

3. Data preparation

In our paper, we conduct experiments on the VCTK and VIVOS datasets. You may use either one or both.

4. Run the code

Configuration

config.py is the most important file. Here, you can find all the configurations related to experiment setups, datasets, models, training, testing, etc. Although the config file has been explained thoroughly, we recommend reading our paper to fully understand each parameter.

Training

Evaluation

Configure a new dataset

Our implementation currently works with the VCTK and VIVOS datasets but can be easily extensible to a new one.

5. Audio generation

6. Acknowledgement

We thank FPT Software for funding and providing GPU infrastructure. We also thank Microsoft for giving access to the DNSMOS Azure service.