MycroftAI / mimic-recording-studio

Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Apache License 2.0
496 stars 114 forks source link

Pad silence trimming calculation to prevent cutting off end of audio. #35

Closed el-tocino closed 3 years ago

el-tocino commented 3 years ago

In certain conditions (English sentences with hard consonants at the end of a sentence?), it appears that MRS may be cutting a bit too aggressively. I would suggest that a .05s or .1s pad be added to either end of the trimming in order to retain the audio.

Relevant code: https://github.com/MycroftAI/mimic-recording-studio/blob/e3e3c8e15544f7e6c7ee2f1f35ea9055908924ce/backend/app/audio.py#L25

thorstenMueller commented 3 years ago

Thanks @el-tocino for opening this issue and thanks @krisgesling for the code change. As i encountered the same problems (last phonemes of some recordings where cut off) i applied your patch manually, give it a try and will report feedback.

Do i need to make a docker build after code change or just start by using docker-compose up

krisgesling commented 3 years ago

Thanks Thorsten - if you give the thumbs up I'll merge it so that we have at least a simple fix. I know others are working on more detailed changes that we can still consider but mostly I don't want to have people getting dud audio in the short term.

docker-compose up will rebuild the container if it detects a change you can also use the --force-recreate flag to be safe if it's not picking up the changes for some reason.

thorstenMueller commented 3 years ago

I've made some quick tests while recording 75 phrases spoken in a normal (little bit louder) volume. And in this scenario it seems to work (no end is cut off).

Bildschirmfoto von 2021-06-10 21-10-17

But i'll start recording a "whispering" dataset tomorrow so i'll check if the end is not cut off in this scenario. If this works well too, i'll give my thumbs up :-).

thorstenMueller commented 3 years ago

I've recorded 300 whispering phrases and listened to them. No phrases has been cutted off :smile: . Tested with default value of 0.3s. So here's my thumb up :+1:

krisgesling commented 3 years ago

Great, thanks for testing that!

Merging now.