推荐:
欢迎加入
Mandarin/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit
这只是一个语音合成前端的Demo,没有提供文本正则化,韵律预测功能,文字转拼音使用pypinyin,分词使用结巴分词,这两者的准确度也达不到商用水平。
其他语音合成项目传送门,端到端是不错的方向,自然度要优于merlin。
This is only a demo of mandarin frontend which is lack of some parts like "text normalization" and "prosody prediction", and the phone set && Question Set this project use havn't fully tested yet.
一个粗略的文档:A draft documentation written in Mandarin
There is no open-source mandarin speech synthesis dataset on the internet, this proj used thchs30 dataset to demostrate speech synthesis
UPDATE
open-source mandarin speech synthesis data from data-banker company, 开源的中文语音合成数据,感谢标贝公司
【数据下载】https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar 【数据说明】http://www.data-baker.com/open_source.html
Listen to https://jackiexiao.github.io/MTTS/
Python : python3.6
System: linux(tested on ubuntu16.04)
pip install jieba pypinyin
sudo apt-get install libatlas3-base
Run bash tools/install_mtts.sh
Or download file by yourself
Run Demo
bash run_demo.sh
python src/mtts.py txtfile wav_directory_path output_directory_path
(Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add-a your_acoustic_model.zip
, otherwise, this project use thchs30.zip acoustic model as defaulttxtfile example
A_01 这是一段文本
A_02 这是第二段文本
wav_directory example(Sampleing Rate should larger than 16khz)
A_01.wav
A_02.wav
python src/mandarin_frontend.py txtfile output_directory_path
from mandarin_frontend import txt2label
result = txt2label('向香港特别行政区同胞澳门和台湾同胞海外侨胞') [print(line) for line in result]
sfsfile='example_file/example.sfs')
see [source
code](https://github.com/Jackiexiao/MTTS/blob/master/src/mandarin_frontend.py) for more information, but pay attention to the alignment file(sfs file), the format is `endtime phone_type` not `start_time, phone_type`(which is different from speech ocean's data)
### 3. Forced-alignment
This project use [Montreal-Forced-Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to do forced alignment, if you want to get a better alignment, use your data to train a alignment-model, see [mfa: algin-using-only-the-dataset](https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html#align-using-only-the-data-set)
1. We trained the acoustic model using thchs30 dataset, see `misc/thchs30.zip`, the dictionary we use [mandarin_mtts.lexicon](https://github.com/Jackiexiao/MTTS/blob/master/misc/mandarin_mtts.lexicon). If you use larger dataset than thchs30, you may get better alignment.
2. If you want to use mfa's (montreal-forced-aligner) pre-trained mandarin model, this is the dictionary you need [mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon](https://github.com/Jackiexiao/MTTS/blob/master/misc/mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon)
## Prosody Mark
You can generate HTS Label without prosody mark. we assume that word segment is
smaller than prosodic word(which is adjusted in code)
"#0","#1", "#2","#3" and "#4" are the prosody labeling symbols.
* #0 stands for word segment
* #1 stands for prosodic word
* #2 stands for stressful word (actually in this project we regrad it as #1)
* #3 stands for prosodic phrase
* #4 stands for intonational phrase
## Improvement to be done in future
* Text Normalization
* Better Chinese word segment
* G2P: Polyphone Problem
* Better Label format and Question Set
* Improvement of prosody analyse
* Better alignment
## Contributor
* Jackiexiao
* willian56