anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
https://anonymous-pits.github.io/pits/
MIT License
275 stars 34 forks source link

PITS

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Abstract: Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shifted synthesis to achieve pitch-controllability. Experiments demonstrate that PITS generates high-quality speech that is indistinguishable from ground truth speech and has high pitch-controllability without quality degradation. Code and audio samples will be available at https://github.com/anonymous-pits/pits.

Training code is uploaded.

Demo and Checkpoint are uploaded at Hugging Face Space🤗

Audio samples are uploaded at github.io.

For the pitch-shifted Inference, we unify to use the notation in scope-shift, s, instead of pitch-shift.

Voice conversion samples are uploaded.

Accepted to ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling

overall

Requiremetns

For VCTK

For custom dataset

Training

Demo

Demo and Checkpoint are uploaded at Hugging Face Space🤗

We are currently working in progress to make dockerfile for local demo. Please wait for it.

References