This repository proviedes unofficial implementation of speech restoration model Miipher. Miipher is originally proposed by Koizumi et. al. arxiv Please note that the model provided in this repository doesn't represent the performance of the original model proposed by Koizumi et. al. as this implementation differs in many ways from the paper.
Install with pip. The installation is confirmed on Python 3.10.11
pip install git+https://github.com/Wataru-Nakata/miipher
The pretrained model is trained on LibriTTS-R and JVS corpus, and provided in CC-BY-NC-2.0 license.
The models are hosted on huggingface
To use pretrained model, please refere to examples/demo.py
original paper | This repo | |
---|---|---|
Clean speech dataset | proprietary | LibriTTS-R and JVS corpus |
Noise dataset | TAU Urban Audio-Visual Scenes 2021 dataset | TAU Urban Audio-Visual Scenes 2021 dataset and Slakh2100 |
Speech SSL model | W2v-BERT XL | WavLM-large |
Language SSL model | PnG BERT | XPhoneBERT |
Feature cleaner building block | DF-Conformer | Conformer |
Vocoder | WaveFit | HiFi-GAN |
X-Vector model | Streaming Conformer-based speaker encoding model | speechbrain/spkrec-xvect-voxceleb |
Code in this repo: MIT License
Weights on huggingface: CC-BY-NC-2.0 license