axinc-ai / ailia-models

The collection of pre-trained, state-of-the-art AI models for ailia SDK
1.99k stars 318 forks source link

ADD yukarin #544

Open kyakuno opened 3 years ago

kyakuno commented 3 years ago

https://github.com/Hiroshiba/realtime-yukarin mit voice conversion

kyakuno commented 3 years ago

https://www.dtmstation.com/archives/41014.html

kyakuno commented 3 years ago

https://voicevox.hiroshiba.jp/

mucunwuxian commented 2 years ago

備åŋ˜đŸ“

yukarinãƒĒポジトãƒĒ各į¨Ž

1st stage https://github.com/Hiroshiba/yukarin

2nd stage https://github.com/Hiroshiba/become-yukarin

3rd stage https://github.com/Hiroshiba/realtime-yukarin


yukarinãƒĒポジトãƒĒäŊœæˆč€…ぎ斚ãĢよるBLOG

デã‚Ŗãƒŧプナãƒŧニãƒŗグぎ力でįĩæœˆã‚†ã‹ã‚ŠãŽåŖ°ãĢãĒãŖãĻãŋたīŧˆ2018-02-13īŧ‰ https://blog.hiroshiba.jp/became-yuduki-yukari-with-deep-learning-power/

DeepLearningでもåŖ°čŗĒ変換したいīŧīŧˆ2017-12-10īŧ‰ https://blog.hiroshiba.jp/voice-conversion-deep-leanring-and-other-delusions/

CycleGANノãƒŗパナãƒŦãƒĢįĩæœˆã‚†ã‹ã‚ŠåŖ°čŗĒ変換やãŖãĻãŋたīŧˆ2018-04-22īŧ‰ https://blog.hiroshiba.jp/became-yuduki-yukari-with-cycle-gan-power/


é–ĸé€ŖãŽč¨˜äē‹

å­Ļįŋ’ãĢåŋ…čĻãĒ、パナãƒŦãƒĢデãƒŧã‚ŋãĢついãĻ https://medium.com/@crosssceneofwindff/%E7%BE%8E%E5%B0%91%E5%A5%B3%E5%A3%B0%E3%81%B8%E3%81%AE%E5%A4%89%E6%8F%9B%E3%81%A8%E5%90%88%E6%88%90-fe251a8e6933 https://aidiary.hatenablog.com/entry/20150310/1425983455 https://www.jstage.jst.go.jp/article/jasj/72/6/72_324/_pdf



čŠąã—ãĻいる内厚が同一であるノãƒŗパナãƒŦãƒĢデãƒŧã‚ŋぎ、時間čģ¸ã‚’揃えãĻパナãƒŦãƒĢデãƒŧã‚ŋ化するå‡Ļį† īŧˆį­†č€…は上手くできãĻãĒいとäģ°ãŖãĻいるが、凄く上手くできãĻいるようãĢ思えぞすīŧ‰ https://blog.hiroshiba.jp/sandbox-alignment-voice-actress-data/


åŽĸ野さんがį´šäģ‹ã—ãĻ下さãŖた、商į”¨åˆŠį”¨å¯ã§ã‚るテキ゚トčĒ­ãŋ上げã‚Ŋフトã‚Ļェã‚ĸ īŧˆã€Œį„Ąæ–™ã§äŊŋえる中品čŗĒãĒテキ゚トčĒ­ãŋ上げã‚Ŋフトã‚Ļェã‚ĸ」とčŦŗわれãĻいぞすが、個äēēįš„ãĢはéĢ˜å“čŗĒだと思えぞすīŧ‰ īŧˆãƒ†ã‚­ã‚šãƒˆã‚’čĒ­ãŋ上げãĻもらい、そぎéŸŗåŖ°ã‚’ã‚ŋãƒŧã‚˛ãƒƒãƒˆãĢã™ã‚‹ã¨č‰¯ã„ã‚‚ãŽã¨æ€ã‚ã‚Œãžã™īŧ‰ īŧˆhiroshibaさんäŊœīŧ‰ https://voicevox.hiroshiba.jp/ â–ŧ そぎGithubãƒĒポジトãƒĒ īŧˆjsonをå…Ĩ力ãĢ、テキ゚ト→éŸŗåŖ°ãŒã‚ŗマãƒŗドナイãƒŗでį”Ÿæˆã§ãã‚‹ã¨æ€ã‚ã‚Œãžã™īŧ‰ https://github.com/Hiroshiba/voicevox


Qiita記äē‹ã€Œį„Ąæ–™ã§éŸŗåŖ°čĒč­˜ãĢäŊŋえるデãƒŧã‚ŋã‚ģット5つ」 https://qiita.com/yarimoto/items/98711f23f90ea068730b â–ŧ Mozillaがį™ēčĄŒã—ãĻくれãĻいるéŸŗåŖ°ãƒ‡ãƒŧã‚ŋã‚ģット īŧˆæ—ĨæœŦčĒžãŽãƒ†ã‚­ã‚šãƒˆã¨ã€ããŽčĒ­ãŋ上げéŸŗåŖ°ãŒåĢぞれるį‚ē、īŧ‰ https://commonvoice.mozilla.org/ja/datasets


MacãĢnode.jsをイãƒŗ゚トãƒŧãƒĢīŧˆvoicevox指厚ぎverは14.17.4īŧ‰ https://qiita.com/kyosuke5_20/items/c5f68fc9d89b84c0df09 â–ŧ おうも上手く動かずâ€Ļ、かつ、GPUがį„Ąã„と推čĢ–が遅いようであるį‚ē、MACでぎ動äŊœã¯æ–­åŋĩしぞした。


voice_datasets、ぞとめ https://github.com/jim-schwoebel/voice_datasets

類äŧŧæŠ€čĄ“īŧˆīŧ‘īŧ‰īŧšAUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss https://github.com/auspicious3000/autovc https://auspicious3000.github.io/autovc-demo/

類äŧŧæŠ€čĄ“īŧˆīŧ’īŧ‰īŧšAssem-VC — Official PyTorch Implementation https://github.com/mindslab-ai/assem-vc https://mindslab-ai.github.io/assem-vc/

類äŧŧæŠ€čĄ“īŧˆīŧ“īŧ‰īŧšVQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion (Interspeech 2021) https://github.com/Wendison/VQMIVC https://wendison.github.io/VQMIVC-demo/

類äŧŧæŠ€čĄ“īŧˆīŧ”īŧ‰īŧšMediumVC https://github.com/BrightGu/MediumVC https://brightgu.github.io/MediumVC/ https://arxiv.org/pdf/2110.02500.pdf

類äŧŧæŠ€čĄ“īŧˆīŧ•īŧ‰īŧšSingleVC https://github.com/BrightGu/SingleVC https://brightgu.github.io/SingleVC/

類äŧŧæŠ€čĄ“īŧˆīŧ–īŧ‰īŧšStarGANv2-VC https://github.com/yl4579/StarGANv2-VC https://starganv2-vc.github.io/


åŖ°čŗĒ変換é–ĸé€Ŗぎį”¨čĒžį­‰ã‚’、čŠŗしくč§ŖčĒŦしãĻ下さãŖãĻã„ã‚‹č¨˜äē‹ https://blog.nefrock.com/entry/2020/03/17/171730

yukarinでį˛žåēĻをéĢ˜ãã™ã‚‹ãŸã‚ãŽãƒŽã‚Ļハã‚Ļã‚’č¨˜äē‹ãĢしãĻ下さãŖãĻいるもぎ https://qiita.com/atticatticattic/items/848869a32413a378ee6d

yukarinãĢãĻäŊŋį”¨ã•ã‚ŒãĻいるpyworldãĢついãĻぎč§ŖčĒŦ記äē‹ https://qiita.com/ohtaman/items/84426cee09c2ba4abc22

čĢ–æ–‡ã‚ĩãƒŧベイīŧšSinging Voice Conversionīŧˆæ­ŒåŖ°å¤‰æ›īŧ‰ https://aria3366.hatenablog.com/


mucunwuxian commented 2 years ago

čĒŋæŸģåˆ†æžãƒĄãƒĸīŧˆīŧ‘īŧ‰ 📝

yukarinでå­Ļįŋ’するãĢåŊ“たãŖãĻ原čˇĩした内厚ãĢつきぞしãĻã€č¨˜éŒ˛ã‚’ã•ã›ãĻ頂きぞす。


ナイブナãƒĒぎイãƒŗ゚トãƒŧãƒĢãĢついãĻ

YukarinãƒĒポジトãƒĒぎREADMEīŧˆæ—ĨæœŦčĒžį‰ˆīŧ‰ã‚’å‚č€ƒãĢé€˛ã‚ãĻいきぞすと、ぞず最初ãĢ「åŋ…čĻãĒナイブナãƒĒぎイãƒŗ゚トãƒŧãƒĢ」ぎ指į¤ēがありぞす。 pip install -r requirements.txtãĢよるもぎですが、そぎ内厚がäģĨ下とãĒりぞす。 https://github.com/Hiroshiba/yukarin/blob/master/requirements.txt

numpy
cupy<6.0.0
chainer<6.0.0
librosa<0.7.0
pysptk
pyworld
matplotlib
tensorflow
tqdm
git+https://github.com/neka-nat/tensorboard-chainer
git+https://github.com/Hiroshiba/become-yukarin


ここで気ãĢãĒるぎは、cupy chainer librosaぎバãƒŧジョãƒŗがäŊŽã„ことです。 PyPIで、Release historyをįĸēčĒã—ãĒãŒã‚‰ã€å…ˇäŊ“įš„ãĢäģĨ下ãĢãĻイãƒŗ゚トãƒŧãƒĢを原æ–ŊしãĻãŋぞした。 尚、į§ãŒåŽŸæ–Ŋしたį’°åĸƒã¯ã€Ubuntu 18.04.6 LTSãĢãĻ、GPUはRTX3090をäŊŋį”¨ã—ãĻいぞす。 RTX3090をį”¨ã„ãĻいることから、CUDAはバãƒŧジョãƒŗが11äģĨ上であるåŋ…čĻãŒã‚り、それãĢäŧ´ãŖãĻCUDNNもバãƒŧジョãƒŗが8äģĨ上とãĒりぞす。

!pip install cupy==5.4.0  # <6.0.0
!pip install chainer==5.4.0  # <6.0.0
!pip install librosa==0.6.3  # <0.7.0


įĩæžœã€chainerとliblosaã¯ã€čŠ˛åŊ“バãƒŧジョãƒŗãĢãĻイãƒŗ゚トãƒŧãƒĢができぞしたが、cupyはイãƒŗ゚トãƒŧãƒĢができぞせんでした。 į†į”ąã¯æã‚‰ãäģĨ下です。

  **************************************************
  *** WARNING: Unsupported cuDNN version: 8005
  *** WARNING: cuDNN v5000= and <=v7999 is required
  **************************************************


ただ、cupyãĢé–ĸしãĻは、ãƒĒポジトãƒĒ内でäŊŋį”¨įŽ‡æ‰€ã‚’検į´ĸしãĻãŋると、そう重čĻãĒå‡Ļį†éƒ¨åˆ†ã§ã¯ãĒさそうであるį‚ē、一æ—Ļã€ã“ã‚Œã§č‰¯ã—ã¨ã—ã‚ˆã†ã¨æ€ã„ãžã™ã€‚


デãƒŧã‚ŋäŊœæˆã¤ã„ãĻīŧˆ1. éŸŗåŖ°ãƒ‡ãƒŧã‚ŋをį”¨æ„ã™ã‚‹īŧ‰

yukarinãƒĒポジトãƒĒは、パナãƒŦãƒĢデãƒŧã‚ŋをį”¨ã„たåŖ°čŗĒå¤‰æ›ã‚’čĄŒã†ã‚‚ãŽã§ã‚ã‚‹į‚ē、それをį”¨æ„ã™ã‚‹åŋ…čĻãŒã‚りぞす。 å°šã€å¤‰æ›å¯žčąĄã¨ã—ãĻは、あるį‰šåŽšãŽæ–šãŽåŖ°ã‚’、所望ぎ斚へぎåŖ°ã¸ã¨å¤‰æ›ã™ã‚‹ã¨ã„うもぎですぎで、one-to-oneãŽå¯žčąĄã¨ãĒりぞす。

å‚č€ƒīŧšhttps://blog.nefrock.com/entry/2020/03/17/171730


先ず、変換先ぎデãƒŧã‚ŋですが、åŽĸé‡Žã•ã‚“ãŒåą•é–‹ã—ãĻ下さãŖãĻいるVoiceVoxをį”¨ã„ようと思いぞす。 尚、VoiceVoxぎéŸŗåŖ°ã¯ã€2021/11/28時į‚šãĢãĻ、å‡ēå…¸ãŽč¨˜čŧ‰ãĢより商į”¨īŧéžå•†į”¨å…ąãĢč¨ąå¯ã‚’ã—ãĻ頂いãĻã„ã‚‹æ¨Ąæ§˜ã§ã™ã€‚

キãƒŖナクã‚ŋãƒŧ「æ˜Ĩæ—Ĩ部つむぎ」ぎ刊į”¨čĻį´„ https://tsukushinyoki10.wixsite.com/ktsumugiofficial/%E5%88%A9%E7%94%A8%E8%A6%8F%E7%B4%84


VoiceVoxぎäŊŋį”¨ãĢé–ĸしãĻは、voicevox_enginãƒĒポジトãƒĒã‚’å‚č€ƒãĢé€˛ã‚ãĻいくと、劚įŽ‡ã‚ˆãéŸŗåŖ°ãƒ‡ãƒŧã‚ŋを取垗することができぞす。 䞋えば、朗čĒ­æ™‚é–“10į§’かかる内厚でも、1į§’をはるかãĢ下回る時間でéŸŗåŖ°ãŒį”Ÿæˆã•ã‚Œãžã™ã€‚ īŧˆâ€ģそれは変換ãĢGPURTX3090をį”¨ã„た際ぎ原čˇĩįĩæžœãĢよる間隔とãĒりぞす。īŧ‰

上記ãĢé–ĸしãĻ、į§ãŒåŽŸčˇĩした斚æŗ•ã¨ã—ãĻは、VOICEVOXぎイãƒŗ゚トãƒŧãƒĢを、ペãƒŧã‚¸č¨˜čŧ‰ãŽæ‰‹é †ãĢãĻ、UbuntuãĢ寞しãĻčĄŒã„ãžã—ãŸã€‚ そしãĻ、VOICEVOXをčĩˇå‹•ã—ãĻいるįŠļ態ãĢãĻ、voicevox_engineãƒĒポジトãƒĒぎAPIドキãƒĨãƒĄãƒŗトãĢ記čŧ‰ã•ã‚ŒãĻいる斚æŗ•ã‚’å‚č€ƒãĢ、shellをįĩ„ãŋ、原čˇĩしたåŊĸとãĒりぞす。 尚、curlãĢついãĻは、ã‚ĸクã‚ģ゚するã‚ĸドãƒŦ゚をダブãƒĢクã‚Ēãƒŧテãƒŧã‚ˇãƒ§ãƒŗでæ‹ŦãŖた斚が、動äŊœãŒåŽ‰åŽšã™ã‚‹ã¨ãŽã“とで、äģĨ下ぎようãĢ原čˇĩすることをã‚Ēã‚šã‚šãƒĄã•ã›ãĻ頂くæŦĄįŦŦです。 īŧˆâ€ģå‚č€ƒīŧšzsh: no matches found:とãĒãŖた時ぎ寞åŋœæ–šæŗ•īŧ‰

echo -n "こんãĢãĄã¯ã€éŸŗåŖ°åˆæˆãŽä¸–į•Œã¸ã‚ˆã†ã“そ" >text.txt

curl -s \
    -X POST \
    "localhost:50021/audio_query?speaker=1" \
    --get --data-urlencode text@text.txt \
    > query.json

curl -s \
    -H "Content-Type: application/json" \
    -X POST \
    -d @query.json \
    "localhost:50021/synthesis?speaker=1" \
    > audio.wav


パナãƒŦãƒĢデãƒŧã‚ŋãĢおけるã‚ŋãƒŧã‚˛ãƒƒãƒˆãŽéŸŗåŖ°ã¯ã€ä¸Šč¨˜ã§į”¨æ„ã§ãã‚‹čĻ‹é€šã—がたãŖたį‚ē、æŦĄãĢは、VoiceVoxãĢ朗čĒ­ã—ãĻもらうテキ゚トと、それをåˆĨぎ斚が朗čĒ­ã—ãĻいるéŸŗåŖ°ãƒ‡ãƒŧã‚ŋがåŋ…čĻã¨ãĒりぞす。 尚、yukarinãƒĒポジトãƒĒč‡ĒäŊ“は、one-to-oneぎåŖ°čŗĒå¤‰æ›ã‚’čĄŒã†ã‚‚ãŽã¨ãĒãŖãĻいぞすが、不į‰šåŽšå¤šæ•°ãŽæ–šãŒäŊŋį”¨ã‚’するailia-modelsãĢ搭čŧ‰ã™ã‚‹æŠŸčƒŊとしãĻは 、many-to-oneãĢするåŋ…čĻãŒã‚ろうかと思いぞす。 ただし、äģĨ下ぎyukarinぎissuesをčĻ‹ã‚‹ã¨ã€many-to-oneぎ原æ–ŊįĸēčĒã¯ã•ã‚ŒãĻいるようãĒå°čąĄã‚’å—ã‘ãžã™ã€‚ https://github.com/Hiroshiba/yukarin/issues/49

ä¸Šč¨˜ãƒ‹ãƒŧã‚ēをæē€ãŸã™ãƒ‡ãƒŧã‚ŋã‚ģットとしãĻ、Mozillaがį™ēčĄŒã—ãĻくれãĻいるéŸŗåŖ°ãƒ‡ãƒŧã‚ŋã‚ģットで、commonvoiceというもぎが存在しぞした。 これは、æ—ĨæœŦčĒžãŽãƒ†ã‚­ã‚šãƒˆã¨ã€éŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢīŧˆ.mp3īŧ‰ãŒã‚ģットで存在するもぎです。 éŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢはįžįŠļで24,000åŧąį¨‹å­˜åœ¨ã€čŠąč€…æ•°ã‚‚å¤šã„å°čąĄīŧˆ1äēēãŽčŠąč€…ã‹ã‚‰ã€10〜20個ぎéŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢがį”Ÿæˆã•ã‚ŒãĻã„ã‚‹å°čąĄīŧ‰ã§ã™ã€‚


そぎå‡Ļį†ã‚’čĄŒã†shellãƒ•ã‚Ąã‚¤ãƒĢ内厚が、äģĨ下とãĒりぞす。 尚、./text/*配下ãĢは、éŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢ毎ãĢ寞åŋœã™ã‚‹ã€æœ—čĒ­å†…åŽšãŽãƒ†ã‚­ã‚šãƒˆãƒ•ã‚Ąã‚¤ãƒĢが、そぎ個数が1寞1とãĒるようãĒåŊĸで、24,000åŧąå€‹æ ŧį´ã•ã‚ŒãĻいるåŊĸです。

#!/bin/zsh

for input in ./text/*
do
  echo "input = $input"
  curl -s \
       -X POST \
       "localhost:50021/audio_query?speaker=8"\
       --get --data-urlencode text@$input \
       > query_.json

  output=`echo ${input/text/audio}`
  output=`echo ${output/.txt/.wav}`
  echo "output = $output"

  curl -s \
       -H "Content-Type: application/json" \
       -X POST \
       -d @query_.json \
       "localhost:50021/synthesis?speaker=8" \
       > $output
done


äģĨ上で、恐らくは、yukarinãƒĒポジトãƒĒでぎå­Ļįŋ’デãƒŧã‚ŋがį”Ÿæˆã•ã‚ŒãŸã‚‚ぎと思われぞす。

æŦĄãŽæ‰‹é †ãĢ合わせãĻ、äģĨ下ぎようãĒフりãƒĢダ構成としぞす。

$ tree *_wav/
input_wav/
├── common_voice_ja_19482480.mp3
├── common_voice_ja_19482491.mp3
├── common_voice_ja_19482498.mp3
├── â€Ļ
├── common_voice_ja_27446518.mp3
├── common_voice_ja_27446519.mp3
└── common_voice_ja_27446520.mp3
target_wav/
├── common_voice_ja_19482480.wav
├── common_voice_ja_19482491.wav
├── common_voice_ja_19482498.wav
├── â€Ļ
├── common_voice_ja_27446518.wav
├── common_voice_ja_27446519.wav
└── common_voice_ja_27446520.wav

0 directories, 46796 files

å°šã€åŒã˜ãƒ•ã‚Ąã‚¤ãƒĢ名į§°īŧˆæ‹Ąåŧĩ子を除くīŧ‰ãŒåŒã˜ãƒ•ã‚Ąã‚¤ãƒĢは、į•°ãĒã‚‹čŠąč€…ãŒåŒã˜å†…åŽšã‚’čŠąã—ãĻいるåŊĸとãĒãŖãĻおり、target_wavフりãƒĢダ内ぎéŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢは全ãĻ、VOICEVOXぎ「æ˜Ĩæ—Ĩ部つむぎ」というキãƒŖナクã‚ŋãƒŧぎéŸŗåŖ°ã¨ãĒãŖãĻいぞす。


デãƒŧã‚ŋäŊœæˆã¤ã„ãĻīŧˆ2. éŸŗéŸŋį‰šåž´é‡ã‚’切りå‡ēすīŧ‰

yukarinぎREADMEãĢよれば、æŦĄãŽã‚šãƒ†ãƒƒãƒ—としãĻ、デãƒŧã‚ŋからéŸŗéŸŋį‰šåž´é‡ã‚’切りå‡ēすとぎčĒŦ明がありぞす。 それを原čˇĩするã‚ŗマãƒŗドは、äģĨ下とぎことです。

python scripts/extract_acoustic_feature.py \
    -i './input_wav/*' \
    -o './input_feature/'

python scripts/extract_acoustic_feature.py \
    -i './target_wav/*' \
    -o './target_feature/'


ここで、scripts/extract_acoustic_feature.pyをįĸēčĒã—ãĻãŋぞすと、äģĨä¸‹ãŽč¨­åŽšã‚ŗãƒŧドをåŸēãĢ、éŸŗéŸŋį‰šåž´é‡ã‚’切りå‡ēしãĻいるようでした。

class AcousticParam(object):
    def __init__(
            self,
            sampling_rate: int = 24000,
            pad_second: float = 0,
            threshold_db: float = None,
            frame_period: int = 5,
            order: int = 8,
            alpha: float = 0.466,
            f0_floor: float = 71,
            f0_ceil: float = 800,
            fft_length: int = 1024,
            dtype: str = 'float32',
    ) -> None:
        self.sampling_rate = sampling_rate
        self.pad_second = pad_second
        self.threshold_db = threshold_db
        self.frame_period = frame_period
        self.order = order
        self.alpha = alpha
        self.f0_floor = f0_floor
        self.f0_ceil = f0_ceil
        self.fft_length = fft_length
        self.dtype = dtype

    def _asdict(self):
        return self.__dict__

æŗ¨į›Žã™ãšããƒã‚¤ãƒŗトとしãĻは、sampling_rateです。 24,000Hzã¨ã„ã†č¨­åŽšãĢãĒãŖãĻいぞす。

VOICEVOXãĢãĻį”Ÿæˆã•ã‚Œã‚‹ã€ã‚­ãƒŖナクã‚ŋãƒŧéŸŗåŖ°ãŽãƒ‡ãƒ•ã‚ŠãƒĢトぎã‚ĩãƒŗプãƒĢãƒŦãƒŧトãĢついãĻも、24,000HzとãĒãŖãĻいぞした。 yukarinぎissuesãĢおいãĻも、įĩæœˆã‚†ã‹ã‚ŠåŖ°čŗĒ変換ぎéŸŗåŖ°ã¯24000Hzをæƒŗ厚しãĻã„ã‚‹ã€‚ã¨ã„ã†č¨˜čŧ‰ãŒã‚りぞす。


ãžãŸã€ã“ãŽč¨­åŽšå€¤ã‚’į”¨ã„ãĻ、äģĨ下ã‚ŗãƒŧドãĢãĻ、į‰šåž´æŠŊå‡ēã‚’čĄŒãŖãĻいぞす。

        f0, t = cls.extract_f0(x=x, fs=fs, frame_period=frame_period, f0_floor=f0_floor, f0_ceil=f0_ceil)
        sp = pyworld.cheaptrick(x, f0, t, fs, fft_size=fft_length)
        ap = pyworld.d4c(x, f0, t, fs, fft_size=fft_length)


こぎã‚ŗãƒŧドは、äģĨ下記äē‹ãĢ書かれãĻいるį‰šåž´æŠŊå‡ēぎ斚æŗ•ã¨åŒæ§˜ãĢãĒãŖãĻいぞす。 WORLDというナイブナãƒĒ、及ãŗ、WORLDぎpython wrapperをį”¨ã„ãĻいぞす。 https://r9y9.github.io/nnmnkwii/v0.0.1/nnmnkwii_gallery/notebooks/00-Quick%20start%20guide.html#Acoustic-features

尚、æŠŊå‡ēしたį‰šåž´ãĢé–ĸしãĻは、äģĨ下pdfがč§ŖčĒŦをしãĻ下さãŖãĻいぞした。 http://www.isc.meiji.ac.jp/~mmorise/lab/publication/paper/SP2017-128.pdf

WORLD ãĢよるéŸŗåŖ°åˆ†æžåˆæˆãŽæĻ‚čĻã‚’å›ŗ 1 ãĢį¤ēすīŧŽ
WORLDではīŧŒéŸŗåŖ°ã‚’フãƒŦãƒŧãƒ ã‚ˇãƒ•ãƒˆåš…æ¯ŽãŽæ™‚é–“ã§åˆ†æžã—īŧŒãƒ•ãƒŦãƒŧム毎ãĢ 3 ã¤ãŽãƒ‘ãƒŠãƒĄãƒŧã‚ŋを取垗するīŧŽ
ãƒ‘ãƒŠãƒĄãƒŧã‚ŋはīŧŒåŸēæœŦ周æŗĸ数 (Fundamental frequency: F0)īŧŒã‚šãƒšã‚¯ãƒˆãƒĢ包įĩĄ (Spectral envelope: SP)īŧŒéžå‘¨æœŸæ€§æŒ‡æ¨™ (Aperiodicity: AP) ぎ 3 į¨ŽéĄžã§ã‚ã‚‹īŧŽ
ã“ã‚Œã‚‰ãŽãƒ‘ãƒŠãƒĄãƒŧã‚ŋはīŧŒãã‚Œãžã‚ŒéŸŗåŖ°ãŽéĢ˜ã•īŧŒéŸŗåŖ°ãŽéŸŗ色īŧŒéŸŗåŖ°ãŽã‹ã™ã‚ŒãŽį¨‹åēĻãĢ寞åŋœã—ãĻいるīŧŽ

image


äģĨ下ぎQiita記äē‹ã‚‚į´šäģ‹ã‚’しãĻ下さãŖãĻいぞす。 https://qiita.com/ohtaman/items/84426cee09c2ba4abc22

1. åŸēæœŦ周æŗĸ数īŧšåŖ°ãŽãƒ™ãƒŧ゚とãĒるéĢ˜ã•ã‚’あらわしぞす
2. ゚ペクトãƒĢ包įĩĄīŧšã„わゆる゚ペクトãƒĢをæģ‘らかãĢしたもぎで、éŸŗč‰˛ã‚’ã‚ã‚‰ã‚ã—ãžã™
3. 非周期性指標īŧšåŖ°å¸¯æŒ¯å‹•ãŽã‚†ã‚‰ãŽã‚„雑éŸŗæˇˇå…ĨãĢよるåŊąéŸŋをあらわしぞす


ぞた、Mozillaį™ēčĄŒãŽcommonvoiceデãƒŧã‚ŋã‚ģットãĢį”¨æ„ã•ã‚ŒãĻいる、éŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢīŧˆ.mp3īŧ‰ãĢついãĻは、ã‚ĩãƒŗプãƒĢãƒŦãƒŧトが48,000HzとãĒãŖãĻいぞす。 こぎã‚ĩãƒŗプãƒĢãƒŦãƒŧトぎ違いが、yukarinでぎå‡Ļį†ãĢおうåŊąéŸŋするかは気ãĢãĒるところです。 åŋĩぎį‚ē、commonvoiceぎéŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢãĢついãĻ、ã‚ĩãƒŗプãƒĢãƒŦãƒŧトを48,000Hzから24,000Hzへと変換するå‡Ļį†ã‚‚、ここで原æ–ŊしãĻおこうと思いぞす。

å°šã€ä¸Šč¨˜å‡Ļį†ãĢはilbrosaナイブナãƒĒをį”¨ã„ぞすが、librosaは.mp3がã‚ĩポãƒŧトされãĻいãĒいとぎことで、.mp3æ‹Ąåŧĩå­ãŽãƒ•ã‚Ąã‚¤ãƒĢを.wavãĢ変換するåŋ…čĻã‚‚ありぞす。

それらå‡Ļį†ã¯ã€äģĨä¸‹ãŽč¨˜äē‹ã‚’å‚č€ƒãĢさせãĻ頂き、äģĨ下ぎã‚ŗãƒŧドで原æ–Ŋをしぞす。 https://algorithm.joho.info/programming/python/pydub-mp3-wav/ https://note.com/npaka/n/n6f421b546024

import pydub
import librosa
import soundfile as sf

for filename_tmp in filename:
    # å…ĨåŠ›ãƒ•ã‚Ąã‚¤ãƒĢ名į§°ã‹ã‚‰ã€å‡ēåŠ›ãƒ•ã‚Ąã‚¤ãƒĢ名į§°ã‚’į”Ÿæˆīŧˆå„斚ぎį’°åĸƒãĢ合わせãĻâ€Ļīŧ‰
    filename_out = filename_tmp.replace('_48000Hz', '')  # フりãƒĢダ名変更īŧˆį§ãŽå ´åˆâ€Ļīŧ‰
    filename_out = filename_out.replace('.mp3', '.wav')  # æ‹Ąåŧĩ子ぎ変更
    # .mp3をčĒ­čžŧ
    sound = pydub.AudioSegment.from_mp3(filename_tmp)
    # .wavãĢãĻ書å‡ē
    sound.export(filename_out, format="wav")
    # .wavをčĒ­čžŧ
    y, sr = librosa.core.load(filename_out, sr=24000, mono=True)
    # .wavãĢãĻ書å‡ēīŧˆ16bitで書きčžŧãŋīŧ‰
    sf.write(filename_out, y, sr, subtype="PCM_16")

ã“ãĄã‚‰ã¯ã€CPUでぎå‡Ļį†ãĢãĒã‚Šãžã™ãŽã§ã€ãƒ•ã‚Ąã‚¤ãƒĢ数が多いと、それãĒりぎ時間をčĻã—ぞす。 尚、変換前垌を、į§ãŽč€ŗで比čŧƒã—た限りは、掆おéŸŗぎåŠŖ化はį„Ąã„ようãĢ思えぞした。


éŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢぎã‚ĩãƒŗプãƒĢãƒŦãƒŧトを合わせたところで、éŸŗéŸŋį‰šåž´ãŽåˆ‡ã‚Šå‡ēしを原æ–ŊしãĻãŋぞす。 攚めぞしãĻ、原čˇĩã‚ŗマãƒŗドはäģĨ下です。

python scripts/extract_acoustic_feature.py \
    -i './input_wav/*' \
    -o './input_feature/'

python scripts/extract_acoustic_feature.py \
    -i './target_wav/*' \
    -o './target_feature/'


ã‚ŗマãƒŗドを原æ–Ŋしたところ、äģĨ下ぎエナãƒŧがį™ēį”Ÿã—ぞした。

Traceback (most recent call last):
  File "scripts/extract_acoustic_feature.py", line 13, in <module>
    from yukarin.acoustic_feature import AcousticFeature
  File "/hoge/yukarin/yukarin/__init__.py", line 1, in <module>
    from .acoustic_converter import AcousticConverter
  File "/hoge/yukarin/yukarin/acoustic_converter.py", line 7, in <module>
    import librosa
  File "/opt/conda/lib/python3.7/site-packages/librosa/__init__.py", line 12, in <module>
    from . import core
  File "/opt/conda/lib/python3.7/site-packages/librosa/core/__init__.py", line 109, in <module>
    from .time_frequency import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.7/site-packages/librosa/core/time_frequency.py", line 10, in <module>
    from ..util.exceptions import ParameterError
  File "/opt/conda/lib/python3.7/site-packages/librosa/util/__init__.py", line 71, in <module>
    from . import decorators
  File "/opt/conda/lib/python3.7/site-packages/librosa/util/decorators.py", line 9, in <module>
    from numba.decorators import jit as optional_jit
ModuleNotFoundError: No module named 'numba.decorators'

こぎエナãƒŧは、pip install numba==0.48を原æ–Ŋすることでč§Ŗæąēされぞした。 新しいバãƒŧジョãƒŗぎnumbaであると、å‡ēãĻしぞうエナãƒŧであるようです。 īŧˆå‚č€ƒīŧšhttps://github.com/librosa/librosa/issues/1160 īŧ‰


寞åŋœã—たところ、äģĨ下ログをå‡ē力し、å‡Ļį†ãŒæ­Ŗ常įĩ‚äē†ã—ぞした。

{'alpha': 0.466,
 'dtype': 'float32',
 'enable_overwrite': False,
 'f0_ceil': 800,
 'f0_floor': 71,
 'fft_length': 1024,
 'frame_period': 5,
 'ignore_feature': ['sp', 'ap'],
 'input_glob': './input_wav/*',
 'order': 8,
 'output': PosixPath('input_feature'),
 'pad_second': 0,
 'sampling_rate': 24000,
 'sampling_rate_for_thresholding': None,
 'threshold_db': None}
100%|███████████████████████████████████████████████████████| 23398/23398 [34:56<00:00, 11.16it/s]

{'alpha': 0.466,
 'dtype': 'float32',
 'enable_overwrite': False,
 'f0_ceil': 800,
 'f0_floor': 71,
 'fft_length': 1024,
 'frame_period': 5,
 'ignore_feature': ['sp', 'ap'],
 'input_glob': './target_wav/*',
 'order': 8,
 'output': PosixPath('target_feature'),
 'pad_second': 0,
 'sampling_rate': 24000,
 'sampling_rate_for_thresholding': None,
 'threshold_db': None}
100%|███████████████████████████████████████████████████████| 23398/23398 [28:54<00:00, 13.49it/s]


こぎå‡Ļį†ã‚’、input_wavフりãƒĢダと、target_wavフりãƒĢダとãĢ寞しãĻ原æ–Ŋしぞす。 すると、input_featureフりãƒĢダと、target_featureフりãƒĢダとぎ配下ãĢ、éŸŗåŖ°ãƒ•ã‚Ąã‚¤ãƒĢを同じ数だけぎ.npyがį”Ÿæˆã•ã‚Œãžã™ã€‚

# tree ./*_feature/
./input_feature/
├── arguments.json
├── common_voice_ja_19482480.npy
├── common_voice_ja_19482491.npy
├── common_voice_ja_19482498.npy
├── â€Ļ
├── common_voice_ja_27446518.npy
├── common_voice_ja_27446519.npy
└── common_voice_ja_27446520.npy
./target_feature/
├── arguments.json
├── common_voice_ja_19482480.npy
├── common_voice_ja_19482491.npy
├── common_voice_ja_19482498.npy
├── â€Ļ
├── common_voice_ja_27446518.npy
├── common_voice_ja_27446519.npy
└── common_voice_ja_27446520.npy

0 directories, 46798 files


こぎnpyãƒ•ã‚Ąã‚¤ãƒĢãĢæ ŧį´ã•ã‚ŒãĻいる内厚ですが、可čĻ–化しãĻãŋるとäģĨ下ぎようãĢãĒりぞした。

å‚č€ƒãžã§ãĢ、先ず、元デãƒŧã‚ŋ .wav を、librosaでloadīŧˆyukarin内ぎå‡Ļį†ã¨åŒæ§˜īŧ‰ã‚’したæŗĸåŊĸデãƒŧã‚ŋを可čĻ–化しぞす。

image

æŦĄãĢ、å‡ē力į‰šåž´ãŽå†…ぎ、åŸēæœŦ周æŗĸ数īŧˆéŸŗåŖ°ãŽå‘¨æœŸæ€§ã‚’襨įžã—、éŸŗéĢ˜ã‚’司るéŸŗéŸŋį‰šåž´é‡īŧ‰ãŽå¯čĻ–化です。

image

æŦĄãĢ、äģĨ下ã‚ŗãƒŧドãĢãĻæŠŊå‡ēされたであろうį‰šåž´ã§ã™ãŒã€ã“ãĄã‚‰ã¯ nan とãĒãŖãĻいぞした。

        sp = pyworld.cheaptrick(x, f0, t, fs, fft_size=fft_length)
        ap = pyworld.d4c(x, f0, t, fs, fft_size=fft_length)
feature1.sp = nan
feature2.sp = nan

feature1.ap = nan
feature2.ap = nan


æŦĄãĢ、å‡ē力į‰šåž´ãŽå†…ぎ、ã‚ŗãƒŧドぎ非周期性というもぎぎ可čĻ–化です。 意å‘ŗ合いやäŊŋい斚ãĢついãĻã¯ã€ãŠã„ãŠã„åˆ†æžã‚’é€˛ã‚ã‚‹éŽį¨‹ãĢãĻ、åŋ…čĻãĢåŋœã˜ãĻ掘り下げようと思いぞす。

image


æŦĄãĢ、å‡ē力į‰šåž´ãŽå†…ãŽã€ãƒĄãƒĢã‚ąãƒ—ã‚šãƒˆãƒŠãƒ ãŽå¯čĻ–化です。

image


最垌ãĢ、å‡ē力į‰šåž´ãŽå†…ぎ、į™ēåŖ°ã‚ŋイミãƒŗグぎ可čĻ–化です。

image


デãƒŧã‚ŋäŊœæˆã¤ã„ãĻīŧˆ3. デãƒŧã‚ŋを揃えるīŧˆã‚ĸãƒŠã‚¤ãƒĄãƒŗトするīŧ‰īŧ‰

æŦĄãŽæ‰‹é †ã¯ã€ãƒ‡ãƒŧã‚ŋを揃えるとぎことです。 これは、ãƒĒポジトãƒĒįŽĄį†č€…HiroshibaさんぎäģĨ下記äē‹ãĢ記čŧ‰ã•ã‚Œã‚‹å†…厚ãĢé–ĸé€Ŗするところかと思われぞす。 https://blog.hiroshiba.jp/sandbox-alignment-voice-actress-data/

å‡Ļį†ã‚ŗマãƒŗドは、äģĨ下とぎことです。

python scripts/extract_align_indexes.py \
    -i1 './input_feature/*.npy' \
    -i2 './target_feature/*.npy' \
    -o './aligned_indexes/'

これを原æ–Ŋすると、äģĨ下ログをå‡ē力ぎ上で、æ­Ŗ常įĩ‚äē†ã—ぞした。

# python scripts/extract_align_indexes.py \
>     -i1 './input_feature/*.npy' \
>     -i2 './target_feature/*.npy' \
>     -o './aligned_indexes/'
{'dtype': 'int32',
 'enable_overwrite': False,
 'ignore_feature': ('feature1', 'feature2'),
 'input_glob1': './input_feature/*.npy',
 'input_glob2': './target_feature/*.npy',
 'output': PosixPath('aligned_indexes')}
100%|██████████████████████████████████████████████████████| 23398/23398 [01:25<00:00, 273.90it/s]

å‡Ļį†įĩ‚äē†åžŒãĢ、å‡ē力フりãƒĢダをįĸēčĒã™ã‚‹ã¨ã€ã“ãĄã‚‰ãĢも.npyãƒ•ã‚Ąã‚¤ãƒĢがæ ŧį´ã•ã‚Œãžã—た。

# tree aligned_indexes/
aligned_indexes/
├── arguments.json
├── common_voice_ja_19482480.npy
├── common_voice_ja_19482491.npy
├── common_voice_ja_19482498.npy
├── â€Ļ
├── common_voice_ja_27446518.npy
├── common_voice_ja_27446519.npy
└── common_voice_ja_27446520.npy

0 directories, 23399 files

ãƒĒポジトãƒĒįŽĄį†č€…ぎHiroshibaさんが、åˆĨ途記čŧ‰ã—ãĻ下さãŖãĻいるBLOGãĢよれば、input_wavとtarget_wavãŽä¸Ąæ–šã‚’æ™‚é–“čĒŋ整したもぎが、æ ŧį´ã•ã‚ŒãĻいる様子です。 scripts/extract_align_indexes.pyぎã‚ŗãƒŧド内厚をįĸēčĒã‚’しãĻãŋぞすと、これは align_indexes という名į§°ãŽã‚¤ãƒŗãƒ‡ãƒƒã‚¯ã‚šæƒ…å ąãŒæ ŧį´ã•ã‚ŒãĻいるようでした。

原čˇĩæ–šæŗ•ã¨ã—ãĻ、Hiroshibaさんが、nnmnkwiiãƒĒポジトãƒĒã‚ˆã‚Šč¸čĨ˛ã™ã‚‹åŊĸで原čŖ…したもぎをį”¨ã„ãĻいるようです。 そぎ中ぎ、coreãĒå‡Ļį†ã¨ã—ãĻは fastdtw をį”¨ã„た時įŗģ列デãƒŧã‚ŋ間距é›ĸæ¸Ŧ厚抟čƒŊをį”¨ã„ãĻいるようです。 fastdtw ãĢついãĻは、äģĨä¸‹ãŽč¨˜äē‹ãŒå‚č€ƒãĢãĒりぞした。 https://irukanobox.blogspot.com/2020/07/dtw.html


そうしãĻæŠŊå‡ēされたもぎを可čĻ–化しãĻãŋぞすと、äģĨ下ぎようãĒイãƒŗãƒ‡ãƒƒã‚¯ã‚šãŽæƒ…å ąã¨ãĒãŖãĻいぞした。 これを元ãĢ、input_wav と target_wav ぎį™ēåŖ°ã‚ŋイミãƒŗグやį™ēåŖ°åŒē間を合わせčžŧむもぎと思われぞす。 嚞つかぎéŸŗåŖ°ãƒ‡ãƒŧã‚ŋãĢついãĻ、å‡ē力されたįĩæžœã‚’č˛ŧãŖãĻいきぞす。

image

image

image

image

image

image

image

image

image

image

image

image


おうやら、éŸŗåŖ°ãŽåˆã‚ã›æ–šã¨ã—ãĻは、input か target ãŽãŠãĄã‚‰ã‹ã‚‰ãŽéŸŗåŖ°ã‚’遅らせることで、原įžã‚’ã™ã‚‹č€ƒãˆãŽã‚ˆã†ãĢ思われぞす。 īŧˆâ€ģ青色įˇšãŒ input、ã‚ĒãƒŦãƒŗジįˇšãŒ target とãĒりぞす。īŧ‰


デãƒŧã‚ŋäŊœæˆã¤ã„ãĻīŧˆ4. 周æŗĸ数ぎįĩąč¨ˆé‡ã‚’æą‚ã‚ã‚‹īŧ‰

デãƒŧã‚ŋäŊœæˆãĢおける最垌ぎ手順ãĢãĒりぞす。 äģĨ下ã‚ŗマãƒŗドぎ原æ–ŊãĢãĻ、周æŗĸ数ぎįĩąč¨ˆé‡ã‚’æą‚ã‚ã‚‹ã¨ãŽã“ã¨ã§ã™ã€‚

python scripts/extract_f0_statistics.py \
    -i './input_feature/*.npy' \
    -o './input_statistics.npy'

python scripts/extract_f0_statistics.py \
    -i './target_feature/*.npy' \
    -o './target_statistics.npy'


ã‚ŗマãƒŗドを原æ–ŊしãĻãŋぞすと、äģĨ下ぎようãĒåŊĸãĢãĻ、æ­Ŗ常įĩ‚äē†ã—ぞした。

# python scripts/extract_f0_statistics.py \
>     -i './input_feature/*.npy' \
>     -o './input_statistics.npy'
{'input_glob': './input_feature/*.npy',
 'output': PosixPath('input_statistics.npy')}
100%|████████████████████████████████████████████████████| 23398/23398 [00:02<00:00, 10279.13it/s]
# python scripts/extract_f0_statistics.py \
>     -i './target_feature/*.npy' \
>     -o './target_statistics.npy'
{'input_glob': './target_feature/*.npy',
 'output': PosixPath('target_statistics.npy')}
100%|████████████████████████████████████████████████████| 23398/23398 [00:01<00:00, 12385.32it/s]


å‡ēåŠ›ã•ã‚ŒãŸãƒ•ã‚Ąã‚¤ãƒĢは、äģĨ下ぎようãĢãĒりぞした。

# ls -l *_statistics.npy
-rw-r--r-- 1 root root 416 Dec 12 13:04 input_statistics.npy
-rw-r--r-- 1 root root 416 Dec 12 13:04 target_statistics.npy


å­Ļįŋ’ãĢついãĻīŧˆ1. å­Ļįŋ’į”¨ãŽč¨­åŽšãƒ•ã‚Ąã‚¤ãƒĢ config.json をäŊœã‚‹īŧ‰

先į¨‹ãžã§ãŽæ‰‹é †ãĢãĻ、å­Ļįŋ’デãƒŧã‚ŋがæē–備できたと思われるį‚ē、æŦĄãŽæ‰‹é †ãŽå­Ļįŋ’į”¨č¨­åŽšãƒ•ã‚Ąã‚¤ãƒĢぎäŊœæˆãĢ進ãŋたいと思いぞす。

å­Ļįŋ’ãŽč¨­åŽšã¯ã€ãƒ•ã‚Ąã‚¤ãƒĢ sample_config.json ãĢãĻ襨įžã‚’するとぎことです。 とりあえずということであれば、input_glob、target_glob、indexes_glob を変更すれば動くとぎことです。

sample_config.json ぎ中čēĢはäģĨ下とãĒãŖãĻいぞす。

{
  "dataset": {
    "acoustic_param": {
      "alpha": 0.410,
      "dtype": "float32",
      "f0_ceil": 800,
      "f0_floor": 71,
      "fft_length": 1024,
      "frame_period": 5,
      "order": 8,
      "pad_second": 0,
      "sampling_rate": 24000,
      "threshold_db": 25
    },
    "input_glob": "./input_feature/*.npy",
    "target_glob": "./target_feature/*.npy",
    "indexes_glob": "./aligned_indexes/*.npy",
    "in_features": [
      "mc"
    ],
    "out_features": [
      "mc"
    ],
    "train_crop_size": 512,
    "input_global_noise": 0.01,
    "input_local_noise": 0.01,
    "target_global_noise": 0.01,
    "target_local_noise": 0.01,
    "seed": 0,
    "num_test": 5
  },
  "model": {
    "in_channels": 9,
    "out_channels": 9,
    "generator_base_channels": 8,
    "generator_extensive_layers": 8,
    "discriminator_base_channels": 1,
    "discriminator_extensive_layers": 5,
    "weak_discriminator": true
  },
  "loss": {
    "adversarial": 0,
    "mse": 100
  },
  "project": {
    "name": "",
    "tags": []
  },
  "train": {
    "batchsize": 8,
    "gpu": 0,
    "log_iteration": 250,
    "snapshot_iteration": 10000,
    "stop_iteration": null,
    "optimizer": {
      "alpha": 0.0002,
      "beta1": 0.5,
      "beta2": 0.999,
      "name": "Adam"
    }
  }
}


input_glob、target_glob、indexes_glob ãĢついãĻも、手順通りãĢ原æ–Ŋしたところで、変更ぎåŋ…čĻãŒãĒさそうですぎで、そぎぞぞぎ内厚としぞす。


å­Ļįŋ’ãĢついãĻīŧˆ2. å­Ļįŋ’å‡Ļį†ã™ã‚‹īŧ‰

いよいよ、æŦĄãŽæ‰‹é †ãĢãĻ、å­Ļįŋ’ぎ原æ–ŊとãĒりぞす。 原æ–Ŋは、äģĨ下ã‚ŗマãƒŗドとぎことです。

python train.py \
    sample_config.json \
    ./model_stage1/


ここで、åēį›¤ãĢãĻinstallãĢå¤ąæ•—ã—ãŸ cupy でエナãƒŧがį™ēåŖ°ã—ぞした。

# python train.py \
>     sample_config.json \
>     ./model_stage1/
Not found cupy.
Traceback (most recent call last):
  File "train.py", line 35, in <module>
    cuda.get_device_from_id(config.train.gpu).use()
  File "/opt/conda/lib/python3.7/site-packages/chainer/backends/cuda.py", line 163, in get_device_from_id
    check_cuda_available()
  File "/opt/conda/lib/python3.7/site-packages/chainer/backends/cuda.py", line 93, in check_cuda_available
    raise RuntimeError(msg)
RuntimeError: CUDA environment is not correctly set up
(see https://github.com/chainer/chainer#installation).No module named 'cupy'


エナãƒŧログãĢ記čŧ‰ã•ã‚ŒãĻいるãƒĒãƒŗクから、äģĨ下ぎãƒĒãƒŗクãĢčžŋりį€ãã“とができ、そこãĢ記čŧ‰ã•ã‚ŒãĻいるã‚ŗマãƒŗドぎ原æ–ŊãĢãĻ、cupy をイãƒŗ゚トãƒŧãƒĢしãĻãŋぞした。 https://docs.cupy.dev/en/stable/install.html

pip install cupy-cuda112

そぎ垌、import cupy を原æ–ŊしãĻãŋぞしたところ、äģĨ下ぎエナãƒŧがį™ēį”Ÿã—ぞした。

ImportError: libnvrtc.so.11.2: cannot open shared object file: No such file or directory

ã“ãĄã‚‰ãĢついãĻ、nvidia-smi ã‚’åŽŸčĄŒã—ãŸéš›ãŽCUDAぎversionが 11.2 とãĒãŖãĻいたį‚ēãĢ、pip install cupy-cuda112 としたæŦĄįŦŦでしたが、äģĨ下記äē‹ãĢよれば、そこがčĒč­˜é•ã„ぎようでした。 https://blog.mktia.com/get-cuda-and-cudnn-version/

nvidia-smi でも CUDA ぎバãƒŧジョãƒŗã‚‰ã—ãã‚‚ãŽã¯čĄ¨į¤ēされぞすがīŧŒãƒ‰ãƒŠã‚¤ãƒãŒå¯žåŋœã—ãĻいる CUDA ぎバãƒŧジョãƒŗã‚’čĄ¨į¤ēしãĻいるãĢ過ぎãĒいとぎことです。 äģŖわりãĢ、į§ãŽį’°åĸƒã§ã™ã¨ã€/usr/local/cuda/bin/nvcc --version というã‚ŗマãƒŗドãĢãĻ、CUDAぎバãƒŧジョãƒŗをįĸēčĒã™ã‚‹ã“とができ、æ­Ŗしくは 11.1 であることが判明しぞした。

# /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

そぎį‚ē、äģĨ下ãĢãĻイãƒŗ゚トãƒŧãƒĢを再原æ–Ŋしぞした。

pip uninstall cupy-cuda112
pip install cupy-cuda111

すると、エナãƒŧがį™ēį”Ÿã›ãšãĢ、å‡Ļį†ãŒå‹•ãå§‹ã‚ãžã—た。 尚、cupyãĢついãĻは、chainerをäŊŋį”¨ã™ã‚‹ãĢåŊ“たãŖãĻ、イãƒŗ゚トãƒŧãƒĢをしãĻいるåŋ…čĻãŒã‚るもぎで、それãĒしでyukarinãƒĒポジトãƒĒぎå­Ļįŋ’ã‚’čĄŒã†ã“ã¨ã¯ã€é›Ŗしいようでした。 īŧˆåŊ“初ぎ判断がčĒ¤ãŖãĻおりぞした。īŧ‰ https://github.com/chainer/chainer#installation


尚、cupy、及ãŗ、cupy-cudaXXXぎイãƒŗ゚トãƒŧãƒĢãĢつきぞしãĻã¯ã€č¨­åŽšãƒ•ã‚Ąã‚¤ãƒĢがイãƒŗ゚トãƒŧãƒĢ時ãĢä¸Šæ›¸ããŒã•ã‚Œã‚‹ã‚ˆã†ã§ã€č¤‡æ•°ã‚¤ãƒŗ゚トãƒŧãƒĢがされãĻいる場合は、最垌ãĢイãƒŗ゚トãƒŧãƒĢしたもぎが、import cupy時ãĢ参į…§ã•ã‚Œã‚‹ã‚ˆã†ã§ã™ã€‚ ã¤ãžã‚Šã€å…ąå­˜ãŒã§ããĒいåŊĸであるį‚ē、全ãĻぎcupyをしãŖかりとuninstallし、äŊ•ã‚ŒãŽcupyもinstallされãĻいãĒいįŠļ態で、所望ぎcupy-cudaXXXをinstallするåŋ…čĻãŒã‚りぞす。


å­Ļįŋ’が動き始めぞすと、train.py ãŽåŽŸčĄŒæ™‚ãĢåŧ•æ•°ã¨ã—ãĻæ¸Ąã—ãŸãƒ•ã‚ŠãƒĢダ配下ãĢ、ãƒĸデãƒĢãƒ•ã‚Ąã‚¤ãƒĢやログがå‡ē力されるåŊĸとãĒりぞした。

īŧˆå­Ļįŋ’原æ–Ŋã‚ŗマãƒŗãƒ‰å†æŽ˛īŧ‰
python train.py \
    sample_config.json \
    ./model_stage1/

 â–ŧ

# ls model_stage1
cg.dot                                       predictor_10000.npz  predictor_50000.npz
config.json                                  predictor_20000.npz  predictor_60000.npz
events.out.tfevents.1639306034.7217833e7f1b  predictor_30000.npz  predictor_70000.npz
log                                          predictor_40000.npz


尚、å­Ļįŋ’ぎ原æ–ŊãĢついãĻ、įĩ‚äē†Epoch数を指厚するįŽ‡æ‰€ãŒčĻ‹åŊ“たらず、Ctrl + C į­‰ã§å‡Ļį†ã‚’æ­ĸめãĒい限り、å­Ļįŋ’å‡Ļį†ãŒįĩ‚わらãĒいようでした。 デãƒŧã‚ŋ数 23398、バッチã‚ĩイã‚ē 128ãĢãĻ、10〜20時間į¨‹ã€å­Ļįŋ’å‡Ļį†ã‚’回しãĻãŋぞす。


ããŽåžŒã€ãĄã‚‡ã†ãŠ20時間į¨‹å­Ļįŋ’を回し、å­Ļįŋ’ãƒĸデãƒĢが一厚ぎiterationé–“éš”ã§č¤‡æ•°äŋå­˜ã•ã‚Œãžã—た。

yukarin# tree model_stage1
model_stage1
├── cg.dot
├── config.json
├── events.out.tfevents.1639306034.7217833e7f1b
├── log
├── predictor_10000.npz
├── predictor_100000.npz
├── predictor_110000.npz
├── predictor_120000.npz
├── predictor_130000.npz
├── predictor_140000.npz
├── predictor_150000.npz
├── predictor_160000.npz
├── predictor_170000.npz
├── predictor_180000.npz
├── predictor_190000.npz
├── predictor_20000.npz
├── predictor_200000.npz
├── predictor_210000.npz
├── predictor_220000.npz
├── predictor_230000.npz
├── predictor_240000.npz
├── predictor_250000.npz
├── predictor_260000.npz
├── predictor_270000.npz
├── predictor_280000.npz
├── predictor_290000.npz
├── predictor_30000.npz
├── predictor_300000.npz
├── predictor_310000.npz
├── predictor_320000.npz
├── predictor_330000.npz
├── predictor_340000.npz
├── predictor_350000.npz
├── predictor_360000.npz
├── predictor_370000.npz
├── predictor_380000.npz
├── predictor_390000.npz
├── predictor_40000.npz
├── predictor_400000.npz
├── predictor_410000.npz
├── predictor_420000.npz
├── predictor_430000.npz
├── predictor_440000.npz
├── predictor_450000.npz
├── predictor_460000.npz
├── predictor_470000.npz
├── predictor_480000.npz
├── predictor_490000.npz
├── predictor_50000.npz
├── predictor_500000.npz
├── predictor_510000.npz
├── predictor_520000.npz
├── predictor_530000.npz
├── predictor_540000.npz
├── predictor_550000.npz
├── predictor_560000.npz
├── predictor_570000.npz
├── predictor_580000.npz
├── predictor_590000.npz
├── predictor_60000.npz
├── predictor_600000.npz
├── predictor_610000.npz
├── predictor_620000.npz
├── predictor_630000.npz
├── predictor_640000.npz
├── predictor_650000.npz
├── predictor_660000.npz
├── predictor_670000.npz
├── predictor_680000.npz
├── predictor_690000.npz
├── predictor_70000.npz
├── predictor_700000.npz
├── predictor_710000.npz
├── predictor_720000.npz
├── predictor_730000.npz
├── predictor_740000.npz
├── predictor_750000.npz
├── predictor_760000.npz
├── predictor_770000.npz
├── predictor_780000.npz
├── predictor_790000.npz
├── predictor_80000.npz
├── predictor_800000.npz
├── predictor_810000.npz
├── predictor_820000.npz
├── predictor_830000.npz
├── predictor_840000.npz
└── predictor_90000.npz

0 directories, 88 files

logãĢ、å­Ļįŋ’ぎlossãŒč¨˜éŒ˛ã•ã‚Œã‚‹ãŽã§ã™ãŒã€ããŽįĩŒéŽã¯äģĨ下ぎようでした。

    {
        "predictor/mse": 0.35239657759666443,
        "predictor/adversarial": 1.009699821472168,
        "predictor/loss": 35.23966979980469,
        "discriminator/real": 0.21418224275112152,
        "discriminator/fake": 0.49350425601005554,
        "discriminator/loss": 0.7076864838600159,
        "discriminator/accuracy": 0.95191650390625,
        "discriminator/precision": 0.9904609629602085,
        "discriminator/recall": 0.9126904296875,
        "test/predictor/mse": 0.4434267282485962,
        "test/predictor/adversarial": 0.6913116574287415,
        "test/predictor/loss": 44.342674255371094,
        "test/discriminator/real": 0.7373173236846924,
        "test/discriminator/fake": 0.7232625484466553,
        "test/discriminator/loss": 1.4605798721313477,
        "test/discriminator/accuracy": 0.44375,
        "test/discriminator/precision": 0.125,
        "test/discriminator/recall": 0.01875,
        "train/predictor/mse": 0.2641863226890564,
        "train/predictor/adversarial": 0.8317578434944153,
        "train/predictor/loss": 26.41863250732422,
        "train/discriminator/real": 0.7838372588157654,
        "train/discriminator/fake": 0.6074777841567993,
        "train/discriminator/loss": 1.39131498336792,
        "train/discriminator/accuracy": 0.509375,
        "train/discriminator/precision": 1.0,
        "train/discriminator/recall": 0.01875,
        "epoch": 5,
        "iteration": 1000,
        "elapsed_time": 81.01380289904773
    },

 â–ŧ

    {
        "predictor/mse": 0.3256767988204956,
        "predictor/adversarial": 5.032227993011475,
        "predictor/loss": 32.56768035888672,
        "discriminator/real": 0.05071548372507095,
        "discriminator/fake": 0.038136936724185944,
        "discriminator/loss": 0.0888524278998375,
        "discriminator/accuracy": 0.98839599609375,
        "discriminator/precision": 0.9971560586514624,
        "discriminator/recall": 0.9796044921875,
        "test/predictor/mse": 0.35689324140548706,
        "test/predictor/adversarial": 3.7905514240264893,
        "test/predictor/loss": 35.68932342529297,
        "test/discriminator/real": 2.556612491607666,
        "test/discriminator/fake": 0.02566264010965824,
        "test/discriminator/loss": 2.582275152206421,
        "test/discriminator/accuracy": 0.509375,
        "test/discriminator/precision": 1.0,
        "test/discriminator/recall": 0.01875,
        "train/predictor/mse": 0.2284717708826065,
        "train/predictor/adversarial": 3.8663880825042725,
        "train/predictor/loss": 22.847177505493164,
        "train/discriminator/real": 2.8593335151672363,
        "train/discriminator/fake": 0.02544046752154827,
        "train/discriminator/loss": 2.8847739696502686,
        "train/discriminator/accuracy": 0.503125,
        "train/discriminator/precision": 1.0,
        "train/discriminator/recall": 0.00625,
        "epoch": 54,
        "iteration": 10000,
        "elapsed_time": 798.8362969011068
    },

 â–ŧ

    {
        "predictor/mse": 0.3052104711532593,
        "predictor/adversarial": 6.364946365356445,
        "predictor/loss": 30.521047592163086,
        "discriminator/real": 0.003857325529679656,
        "discriminator/fake": 0.00210120202973485,
        "discriminator/loss": 0.005958528723567724,
        "discriminator/accuracy": 0.99937744140625,
        "discriminator/precision": 0.9999804520455513,
        "discriminator/recall": 0.9987744140625,
        "test/predictor/mse": 0.3660505414009094,
        "test/predictor/adversarial": 1.631148006708827e-05,
        "test/predictor/loss": 36.60505294799805,
        "test/discriminator/real": 9.940130257746205e-05,
        "test/discriminator/fake": 13.819330215454102,
        "test/discriminator/loss": 13.819429397583008,
        "test/discriminator/accuracy": 0.5,
        "test/discriminator/precision": 0.5,
        "test/discriminator/recall": 1.0,
        "train/predictor/mse": 0.22484809160232544,
        "train/predictor/adversarial": 1.64138382388046e-05,
        "train/predictor/loss": 22.48480987548828,
        "train/discriminator/real": 5.373924068408087e-05,
        "train/discriminator/fake": 13.840730667114258,
        "train/discriminator/loss": 13.840784072875977,
        "train/discriminator/accuracy": 0.5,
        "train/discriminator/precision": 0.5,
        "train/discriminator/recall": 1.0,
        "epoch": 547,
        "iteration": 100000,
        "elapsed_time": 8113.639713731012
    },

 â–ŧ

    {
        "predictor/mse": 0.30366745591163635,
        "predictor/adversarial": 7.030780792236328,
        "predictor/loss": 30.366737365722656,
        "discriminator/real": 0.010039892978966236,
        "discriminator/fake": 0.0017708293162286282,
        "discriminator/loss": 0.011810722760856152,
        "discriminator/accuracy": 0.9988623046875,
        "discriminator/precision": 0.999941303506524,
        "discriminator/recall": 0.997783203125,
        "test/predictor/mse": 0.35088080167770386,
        "test/predictor/adversarial": 6.8735448621737305e-06,
        "test/predictor/loss": 35.08808135986328,
        "test/discriminator/real": 0.0005439310916699469,
        "test/discriminator/fake": 16.6959228515625,
        "test/discriminator/loss": 16.69646644592285,
        "test/discriminator/accuracy": 0.5,
        "test/discriminator/precision": 0.5,
        "test/discriminator/recall": 1.0,
        "train/predictor/mse": 0.24123618006706238,
        "train/predictor/adversarial": 6.821536317147547e-06,
        "train/predictor/loss": 24.12361717224121,
        "train/discriminator/real": 0.0016821377212181687,
        "train/discriminator/fake": 16.701169967651367,
        "train/discriminator/loss": 16.702852249145508,
        "train/discriminator/accuracy": 0.5,
        "train/discriminator/precision": 0.5,
        "train/discriminator/recall": 1.0,
        "epoch": 1094,
        "iteration": 200000,
        "elapsed_time": 16456.76111229905
    },

 â–ŧ

    {
        "predictor/mse": 0.2946970760822296,
        "predictor/adversarial": 8.057758331298828,
        "predictor/loss": 29.469711303710938,
        "discriminator/real": 0.002485891105607152,
        "discriminator/fake": 0.0005037991795688868,
        "discriminator/loss": 0.0029896902851760387,
        "discriminator/accuracy": 0.99978515625,
        "discriminator/precision": 0.9999804735172078,
        "discriminator/recall": 0.99958984375,
        "test/predictor/mse": 0.3218367099761963,
        "test/predictor/adversarial": 2.8206122806295753e-06,
        "test/predictor/loss": 32.18367004394531,
        "test/discriminator/real": 1.4969022004152066e-06,
        "test/discriminator/fake": 18.49074363708496,
        "test/discriminator/loss": 18.490745544433594,
        "test/discriminator/accuracy": 0.5,
        "test/discriminator/precision": 0.5,
        "test/discriminator/recall": 1.0,
        "train/predictor/mse": 0.22985798120498657,
        "train/predictor/adversarial": 2.979112196044298e-06,
        "train/predictor/loss": 22.985797882080078,
        "train/discriminator/real": 6.583236972801387e-05,
        "train/discriminator/fake": 18.465129852294922,
        "train/discriminator/loss": 18.46519660949707,
        "train/discriminator/accuracy": 0.5,
        "train/discriminator/precision": 0.5,
        "train/discriminator/recall": 1.0,
        "epoch": 2188,
        "iteration": 400000,
        "elapsed_time": 33794.79855498602
    },

 â–ŧ

    {
        "predictor/mse": 0.2946617007255554,
        "predictor/adversarial": 8.973788261413574,
        "predictor/loss": 29.466167449951172,
        "discriminator/real": 0.0021689562126994133,
        "discriminator/fake": 0.00024469412164762616,
        "discriminator/loss": 0.002413650043308735,
        "discriminator/accuracy": 0.999833984375,
        "discriminator/precision": 0.9999902200488998,
        "discriminator/recall": 0.999677734375,
        "test/predictor/mse": 0.33563244342803955,
        "test/predictor/adversarial": 9.00166441386574e-10,
        "test/predictor/loss": 33.5632438659668,
        "test/discriminator/real": 2.0852203519439172e-08,
        "test/discriminator/fake": 30.933984756469727,
        "test/discriminator/loss": 30.933984756469727,
        "test/discriminator/accuracy": 0.5,
        "test/discriminator/precision": 0.5,
        "test/discriminator/recall": 1.0,
        "train/predictor/mse": 0.26031461358070374,
        "train/predictor/adversarial": 7.850932681741085e-10,
        "train/predictor/loss": 26.031461715698242,
        "train/discriminator/real": 1.5840148748225147e-08,
        "train/discriminator/fake": 31.085153579711914,
        "train/discriminator/loss": 31.085153579711914,
        "train/discriminator/accuracy": 0.5,
        "train/discriminator/precision": 0.5,
        "train/discriminator/recall": 1.0,
        "epoch": 4596,
        "iteration": 840000,
        "elapsed_time": 74940.43543752504
    },


å­Ļįŋ’ãĢついãĻīŧˆ3. テ゚トīŧ‰

æŦĄãĢã€ãƒ†ã‚šãƒˆã‚’čĄŒãŖãĻãŋぞす。 先ずは、å­Ļįŋ’ãĢį”¨ã„たデãƒŧã‚ŋがおれį¨‹ä¸Šæ‰‹ãå¤‰æ›ã§ãã‚‹ã‹ã‚’įĸēčĒã—ぞす。 尚、å­Ļįŋ’デãƒŧã‚ŋは、寞とãĒるã‚ģットぎéŸŗåŖ°ãƒ‡ãƒŧã‚ŋが23,398×2個と、į›¸åŊ“数存在しぞす。


å­Ļįŋ’デãƒŧã‚ŋãĢãĻã€ãƒ†ã‚šãƒˆã‚’čĄŒã†ã‚ŗマãƒŗドは、äģĨ下とぎことです。

python scripts/voice_change.py \
    --model_dir './model_stage1' \
    --config_path './model_stage1/config.json' \
    --input_statistics 'input_statistics.npy' \
    --target_statistics 'target_statistics.npy' \
    --output_sampling_rate 24000 \
    --disable_dataset_test \
    --test_wave_dir './input_wav/' \
    --output_dir './output/'


åŽŸčĄŒã—ãĻãŋぞすと、äģĨ下ぎエナãƒŧがį™ēį”Ÿã—ぞした。

Traceback (most recent call last):
  File "scripts/voice_change.py", line 11, in <module>
    from yukarin import AcousticConverter
  File "/docker/ax/20211128_yukarin/yukarin/__init__.py", line 1, in <module>
    from .acoustic_converter import AcousticConverter
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 14, in <module>
    from yukarin.dataset import decode_feature
  File "/docker/ax/20211128_yukarin/yukarin/dataset.py", line 10, in <module>
    from yukarin.align_indexes import AlignIndexes
  File "/docker/ax/20211128_yukarin/yukarin/align_indexes.py", line 5, in <module>
    from become_yukarin.dataset.utility import MelCepstrumAligner
ModuleNotFoundError: No module named 'become_yukarin'


ã“ãĄã‚‰ã¯ã€requirements.txt ãĢ記čŧ‰ã•ã‚ŒãĻいたもぎぎ、イãƒŗ゚トãƒŧãƒĢがæŧã‚ŒãĻいたもぎでした。 äģĨ下ã‚ŗマãƒŗドãĢãĻ、イãƒŗ゚トãƒŧãƒĢを原æ–Ŋしぞす。

pip install git+https://github.com/Hiroshiba/become-yukarin


再åēĻ、先į¨‹ãŽãƒ†ã‚šãƒˆã‚ŗマãƒŗドを原æ–Ŋしぞす。 すると、äģĨ下エナãƒŧログがå‡ē力されぞした。

Loaded acoustic converter model "model_stage1/predictor_840000.npz"
Traceback (most recent call last):
  File "scripts/voice_change.py", line 67, in process
    p_in = Path(glob.glob(str(dataset_wave_dir / p_in.stem) + '.*')[0])
TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
Traceback (most recent call last):
  File "scripts/voice_change.py", line 67, in process
    p_in = Path(glob.glob(str(dataset_wave_dir / p_in.stem) + '.*')[0])
TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
Traceback (most recent call last):
  File "scripts/voice_change.py", line 67, in process
    p_in = Path(glob.glob(str(dataset_wave_dir / p_in.stem) + '.*')[0])
TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
Traceback (most recent call last):
  File "scripts/voice_change.py", line 67, in process
    p_in = Path(glob.glob(str(dataset_wave_dir / p_in.stem) + '.*')[0])
TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
Traceback (most recent call last):
  File "scripts/voice_change.py", line 67, in process
    p_in = Path(glob.glob(str(dataset_wave_dir / p_in.stem) + '.*')[0])
TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
Traceback (most recent call last):
  File "scripts/voice_change.py", line 75, in process
    f_in_effective, effective = acoustic_converter.separate_effective(wave=w_in, feature=f_in, threshold=threshold)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 92, in separate_effective
    feature = feature.indexing(effective)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 86, in indexing
    f0=self.f0[index] if _is_target(self.f0) else numpy.nan,
IndexError: boolean index did not match indexed array along dimension 0; dimension is 922 but corresponding boolean dimension is 921
Traceback (most recent call last):
  File "scripts/voice_change.py", line 75, in process
    f_in_effective, effective = acoustic_converter.separate_effective(wave=w_in, feature=f_in, threshold=threshold)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 92, in separate_effective
    feature = feature.indexing(effective)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 86, in indexing
    f0=self.f0[index] if _is_target(self.f0) else numpy.nan,
IndexError: boolean index did not match indexed array along dimension 0; dimension is 692 but corresponding boolean dimension is 691
Traceback (most recent call last):
  File "scripts/voice_change.py", line 75, in process
    f_in_effective, effective = acoustic_converter.separate_effective(wave=w_in, feature=f_in, threshold=threshold)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 92, in separate_effective
    feature = feature.indexing(effective)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 86, in indexing
    f0=self.f0[index] if _is_target(self.f0) else numpy.nan,
IndexError: boolean index did not match indexed array along dimension 0; dimension is 1076 but corresponding boolean dimension is 1075
Traceback (most recent call last):
  File "scripts/voice_change.py", line 75, in process
    f_in_effective, effective = acoustic_converter.separate_effective(wave=w_in, feature=f_in, threshold=threshold)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 92, in separate_effective
    feature = feature.indexing(effective)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 86, in indexing
    f0=self.f0[index] if _is_target(self.f0) else numpy.nan,
IndexError: boolean index did not match indexed array along dimension 0; dimension is 692 but corresponding boolean dimension is 691
Traceback (most recent call last):
  File "scripts/voice_change.py", line 75, in process
    f_in_effective, effective = acoustic_converter.separate_effective(wave=w_in, feature=f_in, threshold=threshold)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 92, in separate_effective
    feature = feature.indexing(effective)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 86, in indexing
    f0=self.f0[index] if _is_target(self.f0) else numpy.nan,
IndexError: boolean index did not match indexed array along dimension 0; dimension is 692 but corresponding boolean dimension is 691
Traceback (most recent call last):
  File "scripts/voice_change.py", line 75, in process
    f_in_effective, effective = acoustic_converter.separate_effective(wave=w_in, feature=f_in, threshold=threshold)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 92, in separate_effective
    feature = feature.indexing(effective)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 86, in indexing
    f0=self.f0[index] if _is_target(self.f0) else numpy.nan,
IndexError: boolean index did not match indexed array along dimension 0; dimension is 692 but corresponding boolean dimension is 691
^CTraceback (most recent call last):
  File "scripts/voice_change.py", line 133, in <module>
    main()
  File "scripts/voice_change.py", line 127, in main
Traceback (most recent call last):
    list(multiprocessing.Pool().map(process_partial, paths_test))
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 268, in map
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 169, in decode_spectrogram
    alpha=pysptk.util.mcepalpha(self.out_sampling_rate),
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 179, in mcepalpha
    alpha in alpha_candidates]
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 179, in <listcomp>
    alpha in alpha_candidates]
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 192, in _warping_vector
    omega = step * np.arange(0, length)
KeyboardInterrupt
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 651, in get
Traceback (most recent call last):
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 170, in decode_spectrogram
    fftlen=fftlen,
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 94, in automatic_type_conversion
    return func(*args, **kwargs).astype(dtypein)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/conversion.py", line 149, in mc2sp
    symc[i] = c[i]
KeyboardInterrupt
    self.wait(timeout)
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 648, in wait
    self._event.wait(timeout)
  File "/opt/conda/lib/python3.7/threading.py", line 552, in wait
    signaled = self._cond.wait(timeout)
  File "/opt/conda/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()
KeyboardInterrupt
Traceback (most recent call last):
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 169, in decode_spectrogram
    alpha=pysptk.util.mcepalpha(self.out_sampling_rate),
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 179, in mcepalpha
    alpha in alpha_candidates]
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 179, in <listcomp>
    alpha in alpha_candidates]
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 195, in _warping_vector
    warpfreq = np.arctan(num / den)
KeyboardInterrupt
Traceback (most recent call last):
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 170, in decode_spectrogram
    fftlen=fftlen,
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 94, in automatic_type_conversion
    return func(*args, **kwargs).astype(dtypein)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/conversion.py", line 154, in mc2sp
    return np.exp(np.fft.rfft(symc).real)
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 170, in decode_spectrogram
    fftlen=fftlen,
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 94, in automatic_type_conversion
    return func(*args, **kwargs).astype(dtypein)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/conversion.py", line 149, in mc2sp
    symc[i] = c[i]
KeyboardInterrupt
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 170, in decode_spectrogram
    fftlen=fftlen,
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 94, in automatic_type_conversion
    return func(*args, **kwargs).astype(dtypein)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/conversion.py", line 143, in mc2sp
    c = freqt(mc, int(fftlen // 2), -alpha)
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 70, in apply_along_last_axis
    ret = func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 80, in automatic_type_conversion
    @decorator
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 170, in decode_spectrogram
    fftlen=fftlen,
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 94, in automatic_type_conversion
    return func(*args, **kwargs).astype(dtypein)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/conversion.py", line 154, in mc2sp
    return np.exp(np.fft.rfft(symc).real)
  File "<__array_function__ internals>", line 6, in rfft
  File "/opt/conda/lib/python3.7/site-packages/numpy/fft/_pocketfft.py", line 409, in rfft
    output = _raw_fft(a, n, axis, True, True, inv_norm)
  File "scripts/voice_change.py", line 78, in process
    f_out = acoustic_converter.convert_loop(f_in_effective)
  File "/opt/conda/lib/python3.7/site-packages/numpy/fft/_pocketfft.py", line 70, in _raw_fft
    r = pfi.execute(a, is_real, is_forward, fct)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 142, in convert_loop
    o_warp = self.convert(f)
KeyboardInterrupt
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 109, in convert
    out = self.model(inputs).data[0]
  File "/docker/ax/20211128_yukarin/yukarin/model.py", line 148, in __call__
    return self.decoder(self.encoder(x))
  File "/docker/ax/20211128_yukarin/yukarin/model.py", line 134, in __call__
    h = self['c%d' % i](h)
  File "/docker/ax/20211128_yukarin/yukarin/model.py", line 70, in __call__
    h = self.c(x)
  File "/opt/conda/lib/python3.7/site-packages/chainer/link.py", line 242, in __call__
    out = forward(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/chainer/links/connection/deconvolution_nd.py", line 150, in forward
    outsize=self.outsize, dilate=self.dilate, groups=self.groups)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 377, in deconvolution_nd
    y, = func.apply(args)
  File "/opt/conda/lib/python3.7/site-packages/chainer/function_node.py", line 263, in apply
    outputs = self.forward(in_data)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 186, in forward
    return self._forward_xp(x, W, b, numpy)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 83, in _forward_xp
    return self._forward_xp_core(x, W, b, xp)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 128, in _forward_xp_core
    gcol = xp.tensordot(W, x, (0, 1)).astype(x.dtype, copy=False)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 169, in decode_spectrogram
    alpha=pysptk.util.mcepalpha(self.out_sampling_rate),
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 179, in mcepalpha
    alpha in alpha_candidates]
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 179, in <listcomp>
    alpha in alpha_candidates]
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 193, in _warping_vector
    num = (1 - alpha * alpha) * np.sin(omega)
  File "scripts/voice_change.py", line 80, in process
    f_out = acoustic_converter.decode_spectrogram(f_out)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 170, in decode_spectrogram
    fftlen=fftlen,
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "scripts/voice_change.py", line 78, in process
    f_out = acoustic_converter.convert_loop(f_in_effective)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
Traceback (most recent call last):
KeyboardInterrupt
  File "<__array_function__ internals>", line 6, in tensordot
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 142, in convert_loop
    o_warp = self.convert(f)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 109, in convert
    out = self.model(inputs).data[0]
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 94, in automatic_type_conversion
    return func(*args, **kwargs).astype(dtypein)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/conversion.py", line 150, in mc2sp
    symc[-i] = c[i]
KeyboardInterrupt
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/numeric.py", line 1132, in tensordot
    res = dot(at, bt)
  File "<__array_function__ internals>", line 6, in dot
KeyboardInterrupt
  File "/docker/ax/20211128_yukarin/yukarin/model.py", line 148, in __call__
    return self.decoder(self.encoder(x))
  File "/docker/ax/20211128_yukarin/yukarin/model.py", line 134, in __call__
    h = self['c%d' % i](h)
  File "/docker/ax/20211128_yukarin/yukarin/model.py", line 70, in __call__
    h = self.c(x)
  File "scripts/voice_change.py", line 74, in process
    f_in = acoustic_converter.extract_acoustic_feature(w_in)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 67, in extract_acoustic_feature
    dtype=self._param.dtype,
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 141, in extract
    mc = pysptk.sp2mc(sp, order=order, alpha=alpha)
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 402, in apply_along_axis
    buff[ind] = asanyarray(func1d(inarr_view[ind], *args, **kwargs))
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 231, in fun
    args, kw = fix(args, kw, sig)
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 203, in fix
    ba = sig.bind(*args, **kwargs)
  File "/opt/conda/lib/python3.7/inspect.py", line 3015, in bind
    return args[0]._bind(args[1:], kwargs)
  File "/opt/conda/lib/python3.7/inspect.py", line 2944, in _bind
    if param.kind == _VAR_POSITIONAL:
KeyboardInterrupt
  File "/opt/conda/lib/python3.7/site-packages/chainer/link.py", line 242, in __call__
    out = forward(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/chainer/links/connection/deconvolution_nd.py", line 150, in forward
    outsize=self.outsize, dilate=self.dilate, groups=self.groups)
  File "scripts/voice_change.py", line 74, in process
    f_in = acoustic_converter.extract_acoustic_feature(w_in)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_converter.py", line 67, in extract_acoustic_feature
    dtype=self._param.dtype,
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 141, in extract
    mc = pysptk.sp2mc(sp, order=order, alpha=alpha)
  File "/opt/conda/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/conda/lib/python3.7/site-packages/pysptk/util.py", line 75, in apply_along_last_axis
    ret = np.apply_along_axis(func, -1, *args, **kwargs)
  File "<__array_function__ internals>", line 6, in apply_along_axis
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 401, in apply_along_axis
    for ind in inds:
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 370, in <genexpr>
    inds = (ind + (Ellipsis,) for ind in inds)
  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/index_tricks.py", line 683, in __next__
    def __next__(self):
KeyboardInterrupt
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 377, in deconvolution_nd
    y, = func.apply(args)
  File "/opt/conda/lib/python3.7/site-packages/chainer/function_node.py", line 263, in apply
    outputs = self.forward(in_data)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 186, in forward
    return self._forward_xp(x, W, b, numpy)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 83, in _forward_xp
    return self._forward_xp_core(x, W, b, xp)
  File "/opt/conda/lib/python3.7/site-packages/chainer/functions/connection/deconvolution_nd.py", line 128, in _forward_xp_core
    gcol = xp.tensordot(W, x, (0, 1)).astype(x.dtype, copy=False)
  File "<__array_function__ internals>", line 6, in tensordot
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/numeric.py", line 1132, in tensordot
    res = dot(at, bt)
  File "<__array_function__ internals>", line 6, in dot
KeyboardInterrupt
Traceback (most recent call last):
  File "scripts/voice_change.py", line 85, in process
    wave = f_out.decode(sampling_rate=sampling_rate, frame_period=frame_period)
  File "/docker/ax/20211128_yukarin/yukarin/acoustic_feature.py", line 193, in decode
    frame_period=frame_period,


一斚で、output フりãƒĢダ配下ãĢは、éŸŗåŖ°å¤‰æ›ã•ã‚ŒãŸãƒ‡ãƒŧã‚ŋがæ ŧį´ã•ã‚ŒãĻいぞす。 変換å‡Ļį†ãŒã€ä¸Šæ‰‹ãčĄŒãˆãĻã„ã‚‹ã‚‚ãŽã‚‚ã‚ã‚Œã°ã€čĄŒãˆãĻいãĒいもぎもある、というåŊĸであるようです。 尚、できあがãŖた変換éŸŗåŖ°ã‚’čžã„ãĻãŋると、å­Ļįŋ’デãƒŧã‚ŋãĢおける変換とãĒã‚Šãžã™ãŒã€ã‚ãžã‚Šä¸Šæ‰‹ãčĄŒãˆãĻいぞせんでした。 čĒ­ãžã‚ŒãĻいる文įĢ ãŽå†…åŽšãŒã€čžãå–ã‚ŒãĒいもぎが大半でした。 čžãå–ã‚Œã‚‹ã‚‚ãŽã§ã‚‚ã€ãƒ­ãƒœãƒƒãƒˆãŒčĒ­ã‚“でいるようãĒ、不č‡Ēį„ļãĒニãƒĨã‚ĸãƒŗ゚です。

尚、å­Ļįŋ’ãĢäŊŋį”¨ã—たéŸŗåŖ°ãƒ‡ãƒŧã‚ŋは、23,398å€‹ãŽãƒ•ã‚Ąã‚¤ãƒĢがあり、ダã‚Ļãƒŗロãƒŧド元ãĢã‚ˆã‚Œã°ã€čŠąč€…æ•°ãŒæã‚‰ã397名かと思われぞす。 そぎãƒŦベãƒĢで変換ができãĒいということは、any-to-oneぎ変換ãĢは不向きということãĒぎかもしれぞせん。

或いは、攚めãĻデãƒŧã‚ŋをįĸēčĒã—ãĻãŋると、æĩˇå¤–ぎ斚がã‚Ģã‚ŋã‚ŗトでæ—ĨæœŦčĒžã‚’čŠąã•ã‚ŒãĻã„ã‚‹ã‚‚ãŽã‚„ã€č¨€ã„é–“é•ãˆãĻいるもぎ、ボã‚Ŋボã‚Ŋã¨čžãå–ã‚ŠãĨらいもぎ、外部ノイã‚ēãŒæˇˇã˜ãŖãĻいるもぎ、į„ĄéŸŗぎもぎį­‰ã€å­Ļįŋ’ã‚’é›ŖしくしãĻいるデãƒŧã‚ŋãŒæˇˇã˜ãŖãĻいる様子でした。

そこで、čŠĻしãĢ、品čŗĒãŽč‰¯ã„éŸŗåŖ°ãƒ‡ãƒŧã‚ŋだけãĢįĩžãŖãĻ、再ãŗå­Ļįŋ’を原æ–ŊしãĻãŋようと思いぞす。

尚、VoiceVoxãĢãĻį”Ÿæˆã—たéŸŗåŖ°ãĢついãĻも、イãƒŗトネãƒŧã‚ˇãƒ§ãƒŗがおかしい部分が、比čŧƒįš„多数存在することãĢ気がäģ˜ããžã—た。 イãƒŗトネãƒŧã‚ˇãƒ§ãƒŗãĢついãĻは、VoiceVoxぎã‚ĸプãƒĒį‰ˆãĢãĻčĒŋ整が可čƒŊãĒぎですが、ã‚ŗマãƒŗドナイãƒŗãĢよるツãƒŧãƒĢ原æ–Ŋであるとé›ŖしいæŦĄįŦŦです。 そぎį‚ē、こぎį‚šã¯į›Žã‚’つむãŖãĻ、å­Ļįŋ’を原æ–Ŋしぞす。


å­Ļįŋ’ãĢついãĻīŧˆX. 再å­Ļįŋ’〜再テ゚トīŧ‰

å­Ļįŋ’デãƒŧã‚ŋを選厚したįĩæžœã€18,000į¨‹ãŽéŸŗåŖ°ãƒ‡ãƒŧã‚ŋを削除し、5,894ãĢぞでįĩžã‚Šãžã—た。 こぎデãƒŧã‚ŋãĢãĻ、攚めãĻ前å‡Ļį†ã‚’原æ–Ŋぎ上、å­Ļįŋ’å‡Ļį†ã‚’原æ–Ŋしぞす。

python scripts/extract_acoustic_feature.py \
    -i './input_wav/*' \
    -o './input_feature/'

python scripts/extract_acoustic_feature.py \
    -i './target_wav/*' \
    -o './target_feature/'
python scripts/extract_align_indexes.py \
    -i1 './input_feature/*.npy' \
    -i2 './target_feature/*.npy' \
    -o './aligned_indexes/'
python scripts/extract_f0_statistics.py \
    -i './input_feature/*.npy' \
    -o './input_statistics.npy'

python scripts/extract_f0_statistics.py \
    -i './target_feature/*.npy' \
    -o './target_statistics.npy'
python train.py \
    sample_config.json \
    ./model_stage1/


é¸åŽšã‚’čĄŒãŖたå­Ļįŋ’デãƒŧã‚ŋãĢãĻ、13時間į¨‹å­Ļįŋ’を回し、ãƒĸデãƒĢãƒ•ã‚Ąã‚¤ãƒĢをį”Ÿæˆã—ぞした。

yukarin# tree model_stage1/
model_stage1/
├── cg.dot
├── config.json
├── events.out.tfevents.1640679958.9e39f0c75923
├── log
├── predictor_10000.npz
├── predictor_100000.npz
├── predictor_110000.npz
├── predictor_120000.npz
├── predictor_130000.npz
├── predictor_140000.npz
├── predictor_150000.npz
├── predictor_160000.npz
├── predictor_170000.npz
├── predictor_180000.npz
├── predictor_190000.npz
├── predictor_20000.npz
├── predictor_200000.npz
├── predictor_210000.npz
├── predictor_220000.npz
├── predictor_230000.npz
├── predictor_240000.npz
├── predictor_250000.npz
├── predictor_260000.npz
├── predictor_270000.npz
├── predictor_280000.npz
├── predictor_290000.npz
├── predictor_30000.npz
├── predictor_300000.npz
├── predictor_310000.npz
├── predictor_320000.npz
├── predictor_330000.npz
├── predictor_340000.npz
├── predictor_350000.npz
├── predictor_360000.npz
├── predictor_370000.npz
├── predictor_380000.npz
├── predictor_390000.npz
├── predictor_40000.npz
├── predictor_400000.npz
├── predictor_410000.npz
├── predictor_420000.npz
├── predictor_430000.npz
├── predictor_440000.npz
├── predictor_450000.npz
├── predictor_460000.npz
├── predictor_470000.npz
├── predictor_480000.npz
├── predictor_490000.npz
├── predictor_50000.npz
├── predictor_500000.npz
├── predictor_510000.npz
├── predictor_520000.npz
├── predictor_530000.npz
├── predictor_540000.npz
├── predictor_550000.npz
├── predictor_560000.npz
├── predictor_570000.npz
├── predictor_580000.npz
├── predictor_590000.npz
├── predictor_60000.npz
├── predictor_600000.npz
├── predictor_610000.npz
├── predictor_620000.npz
├── predictor_630000.npz
├── predictor_640000.npz
├── predictor_70000.npz
├── predictor_80000.npz
└── predictor_90000.npz

0 directories, 68 files


最įĩ‚įš„ãĒlogはäģĨ下とãĒりぞす。 選厚前よりも攚善しãĻいるようãĢ思えぞす。

    {
        "predictor/mse": 0.26048386096954346,
        "predictor/adversarial": 34.359676361083984,
        "predictor/loss": 26.048383712768555,
        "discriminator/real": 0.006752286572009325,
        "discriminator/fake": 0.016222849488258362,
        "discriminator/loss": 0.02297513745725155,
        "discriminator/accuracy": 0.9975341796875,
        "discriminator/precision": 0.9963241332156757,
        "discriminator/recall": 0.9987548828125,
        "test/predictor/mse": 0.3014129102230072,
        "test/predictor/adversarial": 0.0002113436785293743,
        "test/predictor/loss": 30.14129066467285,
        "test/discriminator/real": 9.132438572123647e-05,
        "test/discriminator/fake": 8.595941543579102,
        "test/discriminator/loss": 8.596033096313477,
        "test/discriminator/accuracy": 0.5,
        "test/discriminator/precision": 0.5,
        "test/discriminator/recall": 1.0,
        "train/predictor/mse": 0.20771053433418274,
        "train/predictor/adversarial": 0.00020970181503798813,
        "train/predictor/loss": 20.771053314208984,
        "train/discriminator/real": 0.0001445886300643906,
        "train/discriminator/fake": 8.599419593811035,
        "train/discriminator/loss": 8.599564552307129,
        "train/discriminator/accuracy": 0.5,
        "train/discriminator/precision": 0.5,
        "train/discriminator/recall": 1.0,
        "epoch": 13950,
        "iteration": 641850,
        "elapsed_time": 54962.55319662788
    }


新たãĢį”Ÿæˆã—たãƒĸデãƒĢãĢãĻ、å­Ļįŋ’デãƒŧã‚ŋãĢ寞しãĻ、攚めãĻテ゚トを原æ–Ŋしぞす。

python scripts/voice_change.py \
    --model_dir './model_stage1' \
    --config_path './model_stage1/config.json' \
    --input_statistics 'input_statistics.npy' \
    --target_statistics 'target_statistics.npy' \
    --output_sampling_rate 24000 \
    --disable_dataset_test \
    --test_wave_dir './input_wav/' \
    --output_dir './output/'


そぎįĩæžœã€å‰ãŽįĩæžœã‚ˆã‚Šã¯æ”šå–„ã•ã‚ŒãŸå°čąĄã¯ã‚ã‚‹ã‚‚ãŽãŽã€čŠąã—ãĻã„ã‚‹å†…åŽšãŒčžãå–ã‚ŒãĒいもぎが大半でした。 å…ˇäŊ“įš„ãĢは、äģĨ下ブログでぎ「ベãƒŧ゚手æŗ•ã§ãŽå¤‰æ›įĩæžœã€ã¨ã„うもぎãĢ、かãĒりåŠŖã‚‹å°čąĄã§ã—ãŸã€‚ https://blog.hiroshiba.jp/voice-conversion-deep-leanring-and-other-delusions/

ただ、äģĨ下ブログãĢæŽ˛čŧ‰ã•ã‚ŒãĻいるįĩæžœã¨ã¯ã€æ¯”čŧƒįš„čŋ‘ã„å°čąĄã‚’å—ã‘ãžã—ãŸã€‚ しかし、やはりåŠŖるとも思えぞす。 https://blog.hiroshiba.jp/voice-conversion-deep-leanring-and-other-delusions/


ここで、å­Ļįŋ’č¨­åŽšã§ã‚ã‚‹ sample_config.json をįœēめãĻãŋぞす。 原は、yukarinぎå­Ļįŋ’中、GPUãŽãƒĄãƒĸãƒĒäŊŋį”¨é‡ã¯1GBæœĒæē€ã§ã—た。 ã˛ã‚‡ãŖとすると、ãƒĒã‚ĸãƒĢã‚ŋイムでぎå‡Ļį†ã‚’č€ƒæ…Žã—ãĻ、čģŊいネットワãƒŧク構造としãĻいるぎかもしれぞせん。

攚めãĻ、config内厚をįĸēčĒã—ãĻãŋると、嚞つか気ãĢãĒるį‚šãŒå­˜åœ¨ã—た。 先ず、lossぎ配分としãĻ、mseが100、adversarialが0とãĒãŖãĻいるようでした。 これは、issueãĢそぎ意å›ŗãŒč¨˜čŧ‰ã•ã‚ŒãĻいぞした。 mseäģĨ外ぎlossが、į›Žįš„と反するåŊĸでぎ品čŗĒ向上を招くようです。 ただ、adversarialを1ãĢしãĻã‚‚č‰¯ã„ã¨ãŽč¨˜čŧ‰ã‚‚ありぞす。 これを、čŠĻしãĻãŋようと思いぞす。 https://github.com/Hiroshiba/yukarin/issues/46 https://github.com/Hiroshiba/yukarin/issues/45

æŦĄãĢ、stop_iterationというもぎをįĸēčĒã—ぞした。 ã“ã“ã§ã€č¨­åŽšã‚’ã™ã‚Œã°ã€å­Ļįŋ’ぎįĩ‚äē†ã‚ŋイミãƒŗグを指厚できるもぎと思われぞす。 500,000į­‰ã‚’č¨­åŽšã—ãĻおこうと思いぞす。

optimizerãŽč¨­åŽšãĢAdamが指厚されãĻいるぎをįĸēčĒã—ぞした。 ã“ãĄã‚‰ã¯ã€RAdamをčŠĻしãĻãŋようと思いぞしたが、chainerãĢはRAdamがį„Ąã„ようでしたぎで、AdaBoundをčŠĻしãĻãŋようと思いぞす。 しかし、chainerぎversionが古く、AdaBoundがAttributeError: module 'chainer.optimizers' has no attribute 'AdaBound'とãĒãŖãĻしぞãŖたį‚ēã€ã“ãĄã‚‰ã¯čĻ‹é€ã‚ã†ã¨æ€ã„ぞす。

batchsizeですが、128を指厚しãĻãŋようと思いぞす。

crop_sizeという項į›ŽãŒã‚ã‚Šã€ã“ãĄã‚‰ã¯äģĨ下ぎã‚ŗãƒŧドãĢよãŖãĻ、1æŦĄå…ƒãŽéŸŗåŖ°ãƒ‡ãƒŧã‚ŋをsplitする際ぎ、デãƒŧã‚ŋé•ˇæŒ‡åŽšãŽã‚ˆã†ã§ã—ãŸã€‚ ã“ãŽč¨­åŽšã¯ã€ãƒ‡ãƒŧã‚ŋが24,000Hzã§ã‚ã‚‹ã“ã¨ã‚’č€ƒãˆåž—ã‚‹ã¨ã€ã‚‚ã†å°‘ã—é•ˇããĻã‚‚č‰¯ã„æ°—ãŒã—ãžã™ãŽã§ã€defaultで512が指厚されãĻいたもぎを、2048へと変更しãĻãŋようと思いぞす。

    start = random.randint(len_time - crop_size + 1)
    return numpy.split(data, [start, start + crop_size], axis=time_axis)[1]

最垌ãĢ、æœŦéĄŒãŽãƒãƒƒãƒˆãƒ¯ãƒŧク構成ですが、フã‚ŖãƒĢã‚ŋぎ需čĻé‡Žį­‰ã¯ã€å…ˆį¨‹ãŽissueį­‰ã‹ã‚‰ã€ãƒãƒĨãƒŧニãƒŗグがしãŖかりされãĻいるようでしたぎで、変更しãĒいようãĢしたいと思いぞす。 ネットワãƒŧクをãƒĒッチãĢするãĢåŊ“たãŖãĻは、generator_base_channels、discriminator_base_channelsčžēりかと思われぞすぎで、それぞれ16倍ãĢしãĻãŋようと思いぞす。


上記configãŽå¤‰æ›´ã‚’čĄŒãŖた上で、再åēĻå­Ļįŋ’を原æ–Ŋしぞす。

すると、stop_iterationã‚’č¨­åŽšã—ãŸã“ã¨ãĢよãŖãĻ、å‡Ļį†é€˛æ—ã‚’į¤ēすプログãƒŦ゚バãƒŧãŒčĄ¨į¤ēされるようãĢãĒりぞした。 å‡Ļį†åŽŒäē†æ™‚間ぎäēˆæ¸Ŧもされるį‚ē、有りé›ŖいæŦĄįŦŦです。

yukarin# python train.py \
>     sample_config.json \
>     ./model_stage1/
     total [#.................................................]  2.52%
this epoch [###########################################.......] 86.65%
     12600 iter, 273 epoch / 500000 iterations
    1.5014 iters/sec. Estimated time to finish: 3 days, 18:10:36.084940.

GPUãŽãƒĄãƒĸãƒĒäŊŋį”¨é‡ã¨ã—ãĻも、13GBį¨‹ãŒäŊŋį”¨ã•ã‚Œã‚‹ã‚ˆã†ãĢãĒりぞした。 batchsizeãŽå¤§ãã•ã‹ã‚‰č€ƒãˆã‚‹ã¨ã€æœ€čŋ‘ぎãƒĒッチãĒãƒĸデãƒĢと比čŧƒã—ãĻ、æąēしãĻãƒĒッチとはいえãĒいかと思いぞすが、取りæ€Ĩぎは、こぎ構造ãĢãĻ取りįĩ„ぞせãĻ頂こうと思いぞす。


4時間į¨‹ã€å­Ļįŋ’を原æ–Ŋした垌、å­Ļįŋ’デãƒŧã‚ŋãĢ寞しãĻテ゚トを原æ–ŊしãĻãŋぞしたところ、比čŧƒįš„かãĒã‚Ščžãå–ã‚Šã‚„ã™ããĒãŖãĻいぞした。 ぞた、å­Ļįŋ’デãƒŧã‚ŋ選厚ぎ際ãĢ、å­Ļįŋ’デãƒŧã‚ŋから除いたデãƒŧã‚ŋぎ内、品čŗĒとしãĻã¯å•éĄŒãĒいもぎをテ゚トデãƒŧã‚ŋとしãĻã€ãƒ†ã‚šãƒˆã‚’čĄŒãŖãĻãŋãžã—ãŸã¨ã“ã‚ã€ã“ãĄã‚‰ã‚‚æ¯”čŧƒįš„čžãå–ã‚Šã‚„ã™ããĒãŖãĻいぞした。 尚、こぎテ゚トデãƒŧã‚ŋã¯ã€åŒä¸€ãŽčŠąč€…ãŒå­Ļįŋ’デãƒŧã‚ŋãĢåĢぞれãĻいたり、åĢぞれãĻいãĒかãŖたりするもぎとãĒりぞす。


ネットワãƒŧク構造をãƒĒッチãĢすることで、品čŗĒぎ向上がčĻ‹čžŧめる期垅が持ãĻぞしたぎで、ここで、もう少しネットワãƒŧã‚¯ãŽãƒ‘ãƒŠãƒĄãƒŧã‚ŋãƒŧをčĒŋ整しようと思いぞす。

å…ˇäŊ“įš„ãĢは、äģĨ下です。

äŊĩせãĻ、batchsizeですが、半分ãĢ減らしãĻ、64を指厚しãĻãŋぞす。

ぞた、ブログãĢよれば、adversarialぎlosså‰˛åˆã‚’éĢ˜ãã™ã‚‹ã¨ã€čŠąč€…æ€§ãŒå¤ąã‚ã‚Œã‚‹ã¨ãŽã“ã¨ã§ã—ãŸãŒã€ãã†ã§ã‚‚ãĒã„å°čąĄã§ã—ãŸã€‚ かつ、adversarialぎlosså‰˛åˆã‚’éĢ˜ãã™ã‚‹ã“ã¨ã§ã€čŠąã—ãĻいる内厚ぎ品čŗĒãŒä¸ŠãŒã‚‹ã¨ãŽč¨˜čŧ‰ã‚‚、ブログãĢありぞしたぎで、adversarialを2ãĢしãĻãŋようと思いぞす。


ä¸Šč¨˜č¨­åŽšãĢãĻ、一晊、å­Ļįŋ’を回しãĻãŋぞした。 そうしãĻį”Ÿæˆã•ã‚ŒãŸãƒĸデãƒĢãĢãĻ、テ゚トデãƒŧã‚ŋでぎéŸŗåŖ°å¤‰æ›ã‚’原æ–ŊしãĻãŋぞした。 しかし、čŠŗしいįĩæžœãĢはč‡ŗりぞせんでした。 先ぎ比čŧƒįš„čžãå–ã‚Šã‚„ã™ã„įĩæžœã‹ã‚‰ãŽé€˛åą•ãŒį„Ąã„åŊĸでした。


そぎ垌も、issueã‚„ãƒ–ãƒ­ã‚°å†…åŽšã‚’å‚č€ƒãĢしãĒãŒã‚‰ã€åšžã¤ã‹ãŽãƒ‘ãƒŠãƒĄãƒŧã‚ŋãƒŧチãƒĨãƒŧニãƒŗã‚°ã‚’čĄŒãŖãĻãŋぞしたが、少々攚善したį¨‹åēĻで、「比čŧƒįš„čžãå–ã‚Šã‚„ã™ã„ã€ã¨ã„ã†ãƒŦベãƒĢをčļ…えられãĒかãŖãŸå°čąĄã§ã™ã€‚ čŠąč€…ãŽå¤‰æ›ã¯ã€ã—ãŖかりできãĻã„ã‚‹å°čąĄã§ã™ã€‚ ã—ã‹ã—ã€čŠąã—ãĻいる内厚が、ハッキãƒĒã¨čžãå–ã‚ŠãĨらいåŊĸです。 或いは、åŽŗ密ãĢã¯ã€å†…åŽšãŒčžãå–ã‚Œã‚‹ã‚‚ãŽã‚ˆã‚Šã‚‚ã€čžãå–ã‚Šãã‚ŒãĒã„ã‚‚ãŽãŽæ–šãŒå¤šã„å°čąĄã§ã™ã€‚

こぎハッキãƒĒã¨ã¯čžãå–ã‚ŠãĨらいéŸŗåŖ°ãŒã€yukarinぎįŦŦ2æŽĩ階でキãƒŦイãĢãĒる可čƒŊ性はあろうかと思うぎですが、įžæ™‚į‚šã§ã€å†…åŽšãŒčžãå–ã‚ŠãĨらいもぎが一čģĸčžãå–ã‚Šã‚„ã™ããĒるというぎは、個äēēįš„ãĢはé›Ŗ易åēĻがéĢ˜ã„ぎではãĒã„ã‹ã¨č€ƒãˆã‚‹æŦĄįŦŦです。

ここからぎ寞į­–ãĢついãĻは、䞋えば、ネットワãƒŧク構造ぎ変更や、å­Ļįŋ’æ–šæŗ•ãŽåˇĨå¤Ģį­‰ã€æ¯”čŧƒįš„æ šæœŦįš„ãĒ寞åŋœãŒåŋ…čĻã¨ãĒるぎではãĒいかと思われぞす。 或いは、比čŧƒįš„į°Ąæ˜“ãĢ寞åŋœã§ããã†ãĒæ–šæŗ•ãŒæ€ã„äģ˜ã‘ば、それを順æŦĄčŠĻさせãĻ頂きたいと思いぞす。


īŧˆåžŒįļšäŊœæĨ­ã€ä¸€æ—Ļäŋį•™â€Ļīŧ‰


mucunwuxian commented 2 years ago

čĒŋæŸģåˆ†æžãƒĄãƒĸīŧˆīŧ’īŧ‰ 📝

yukarinぎå­Ļįŋ’å‡Ļį†ã‚„デãƒŧã‚ŋčģĸé€ãŽåž…ãĄæ™‚é–“ãĢ、同様į ”įŠļãĢおける最新動向ãĢついãĻもčĒŋずãĻãŋぞした。

最新ぎčĢ–æ–‡ãĒおですと、į‰šãĢ、any-to-anyでぎį ”įŠļįĩæžœãŒå¤šã„ようでした。 尚、äģŠå›žã€åŽŸčˇĩしたいことは、any-to-oneãĢãĒりぞす。 any-to-anyは、any-to-oneぎ一čˆŦ化ãĢãĒりぞすが、į˛žåēĻį™ē揎ぎé›Ŗ易åēĻがåŸēæœŦįš„ãĢはéĢ˜ããĒるもぎと思われぞす。

åŊ“芲issueぎ先čŋ°ã‚ŗãƒĄãƒŗトãĢも、æ•Ŗ文įš„ãĢčĒŋæŸģįĩæžœãŽãƒĒãƒŗクをč˛ŧらせãĻ頂いãĻいぞすが、そぎ中でも、äģŠå›žåŽŸæ–ŊしたいことãĢマッチしãĻいãĻ、かつ、デãƒĸįĩæžœãŽį˛žåēĻがéĢ˜ã„もぎとしãĻ、äģĨ下がありぞした。

īŧˆany-to-anyīŧšMediumVCīŧ‰

īŧˆany-to-oneīŧšSingleVCīŧ‰


å°šã€ä¸Šč¨˜2つぎãƒĒポジトãƒĒは、äģĨä¸‹ãŽå…ąé€šã™ã‚‹1čĢ–æ–‡ãĢãĻ、č§ŖčĒŦがぞとめられãĻいぞす。 https://arxiv.org/pdf/2110.02500.pdf

čĢ–æ–‡ãĢよれば、SingleVCã‚’čĄŒãŖた垌、MediumVCã‚’čĄŒã†ã¨ãŽã“ã¨ã§ã€ã¤ãžã‚Šã€å…ˆãš any-to-one ぎ変換をかけた垌、one-to-any ぎ変換を原æ–Ŋすることで、įĩæžœįš„ãĢ any-to-any を原įžã™ã‚‹ã¨ãŽã“とです。 čĢ–文中では、中間įš„ãĒ one ぎことを、specificspeaker speeches as the intermedium features(SSIF) ã¨čĄ¨įžã—ãĻいぞす。 image


äģŠå›žã§č¨€ãˆã°ã€å…ˆãšã¯ã€any-to-oneを原įžã—たいæŦĄįŦŦであるį‚ē、SingleVCがį˛žåēĻéĢ˜ãã§ãã‚Œã°ã€į›Žįš„が果たせそうです。 そぎSingleVCãĢついãĻã§ã™ãŒã€ä¸Šč¨˜yukarinぎå­Ļįŋ’ãĢもį”¨ã„させãĻ頂いた、

Mozillaがį™ēčĄŒã—ãĻくれãĻいるéŸŗåŖ°ãƒ‡ãƒŧã‚ŋã‚ģット īŧˆæ—ĨæœŦčĒžãŽãƒ†ã‚­ã‚šãƒˆã¨ã€ããŽčĒ­ãŋ上げéŸŗåŖ°ãŒåĢぞれるį‚ē、īŧ‰ https://commonvoice.mozilla.org/ja/datasets


īŧˆįļšããžã™â€Ļīŧ‰

http://www.udialogue.org/ja/download-ja/cstr-vctk-corpus.html pip install pyrubberband apt-get install libsndfile1 https://akio-blogger.blogspot.com/2018/01/dockerubuntusndfile.html?m=1 パ゚čĒŋ整 os.environ['CUDA_VISIBLE_DEVICES'] = '0' pip install transformers apt-get update -y apt-get install -y rubberband-cli

mucunwuxian commented 2 years ago

īŧˆå‚™åŋ˜īŧ‰ äģĨä¸‹ã€å‚č€ƒãžã§ãŽslackでぎやり取りとãĒりぞす。 https://axincai.slack.com/archives/C019HCVQBCP/p1641213162122700

HiroshibaさんがおäŊŋいãĢãĒられãĻいる、JVSã‚ŗãƒŧパ゚デãƒŧã‚ŋã‚ģット も、å­Ļįŋ’デãƒŧã‚ŋãĢäŊŋį”¨ã™ã‚‹ã“ã¨ã‚’æ¤œč¨Žã™ã‚‹ã€‚ HiroshibaさんぎéŸŗåŖ°å¤‰æ›ãŒéĢ˜ã„į˛žåēĻで原æ–ŊされãĻおり、デãƒŧã‚ŋぎ品čŗĒぎéĢ˜ã•ã¨ã„うところで、期垅が持ãĻã‚‹ã‚‚ãŽã¨č€ƒãˆãĻおりぞす。 ã‚ĩãƒŗプãƒĢéŸŗåŖ°ã‚’čžã„ãĻãŋãĻãŋると、Mozillaがį™ēčĄŒã—ãĻくれãĻいるéŸŗåŖ°ãƒ‡ãƒŧã‚ŋã‚ģット「commonvoice」と比ずãĻ、éŸŗčŗĒがかãĒりクãƒĒã‚ĸであるようãĢ思えぞす。 ぞた、「commonvoice」とčĒ­ãŋ上げãĻいるテキ゚トは同じようで、これはJSUTã‚ŗãƒŧパ゚ãĢæē–ずるもぎぎようです。 つぞり、変換先ぎデãƒŧã‚ŋを新たãĢäŊœæˆã™ã‚‹åŋ…čĻã¯į„Ąã„į‚ē、分析ぎ劚įŽ‡ã¯č‰¯ã„かと思われぞす。

ぞた、éŸŗåŖ°čĒč­˜ -> éŸŗåŖ°åˆæˆ という斚æŗ•ãĢついãĻも、čĒŋæŸģを原æ–ŊしãĻãŋようと思いぞす。