Wataru-Nakata / miipher

Unofficial implementation of miipher
MIT License
109 stars 15 forks source link

miipher

This repository proviedes unofficial implementation of speech restoration model Miipher. Miipher is originally proposed by Koizumi et. al. arxiv Please note that the model provided in this repository doesn't represent the performance of the original model proposed by Koizumi et. al. as this implementation differs in many ways from the paper.

Installation

Install with pip. The installation is confirmed on Python 3.10.11

pip install git+https://github.com/Wataru-Nakata/miipher

Pretrained model

The pretrained model is trained on LibriTTS-R and JVS corpus, and provided in CC-BY-NC-2.0 license.

The models are hosted on huggingface

To use pretrained model, please refere to examples/demo.py

Differences from the original paper

original paper This repo
Clean speech dataset proprietary LibriTTS-R and JVS corpus
Noise dataset TAU Urban Audio-Visual Scenes 2021 dataset TAU Urban Audio-Visual Scenes 2021 dataset and Slakh2100
Speech SSL model W2v-BERT XL WavLM-large
Language SSL model PnG BERT XPhoneBERT
Feature cleaner building block DF-Conformer Conformer
Vocoder WaveFit HiFi-GAN
X-Vector model Streaming Conformer-based speaker encoding model speechbrain/spkrec-xvect-voxceleb

LICENSE

Code in this repo: MIT License

Weights on huggingface: CC-BY-NC-2.0 license