miipher

This repository proviedes unofficial implementation of speech restoration model Miipher. Miipher is originally proposed by Koizumi et. al. arxiv Please note that the model provided in this repository doesn't represent the performance of the original model proposed by Koizumi et. al. as this implementation differs in many ways from the paper.

Installation

Install with pip. The installation is confirmed on Python 3.10.11

pip install git+https://github.com/Wataru-Nakata/miipher

Pretrained model

The pretrained model is trained on LibriTTS-R and JVS corpus, and provided in CC-BY-NC-2.0 license.

The models are hosted on huggingface

To use pretrained model, please refere to examples/demo.py

Differences from the original paper

	original paper	This repo
Clean speech dataset	proprietary	LibriTTS-R and JVS corpus
Noise dataset	TAU Urban Audio-Visual Scenes 2021 dataset	TAU Urban Audio-Visual Scenes 2021 dataset and Slakh2100
Speech SSL model	W2v-BERT XL	WavLM-large
Language SSL model	PnG BERT	XPhoneBERT
Feature cleaner building block	DF-Conformer	Conformer
Vocoder	WaveFit	HiFi-GAN
X-Vector model	Streaming Conformer-based speaker encoding model	speechbrain/spkrec-xvect-voxceleb

LICENSE

Code in this repo: MIT License

Weights on huggingface: CC-BY-NC-2.0 license

Wataru-Nakata / miipher

readme

miipher

Installation

Pretrained model

Differences from the original paper

LICENSE