Hi, @Huishou TIMIT is a speech dataset aligned with its phonemes, the net1 is a speech recognizer trained with the speech and the phoenemes equivalent, then pass the recognized from net1 to net2, net2 is just synthetizing net1 prediction. That's why is important to use a dataset for net1 with lots of speakers to get a good prediction of what the speaker said.

andabi / deep-voice-conversion

Deep neural networks for voice conversion (voice style transfer) in Tensorflow

MIT License

3.92k stars 843 forks source link

Hi, @Huishou TIMIT is a speech dataset aligned with its phonemes, the net1 is a speech recognizer trained with the speech and the phoenemes equivalent, then pass the recognized from net1 to net2, net2 is just synthetizing net1 prediction. That's why is important to use a dataset for net1 with lots of speakers to get a good prediction of what the speaker said. #92

Open Sun-Ziyi opened 5 years ago

Sun-Ziyi commented 5 years ago

Hi, @Huishou TIMIT is a speech dataset aligned with its phonemes, the net1 is a speech recognizer trained with the speech and the phoenemes equivalent, then pass the recognized from net1 to net2, net2 is just synthetizing net1 prediction. That's why is important to use a dataset for net1 with lots of speakers to get a good prediction of what the speaker said.

Originally posted by @carlfm01 in https://github.com/andabi/deep-voice-conversion/issues/52#issuecomment-459856993

Huishou commented 5 years ago

Hi，@Sun-Ziyi，I can add your weixin .

Sun-Ziyi commented 5 years ago

@Huishou Can I consult some issues about running program ? 18683664029WEIXIN