MycroftAI / mimic-recording-studio

Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Apache License 2.0
496 stars 114 forks source link

changes in trim_silence function regarding the agressive trimming issue #46

Open amoljagadambe opened 3 years ago

amoljagadambe commented 3 years ago

How to use this template

Under each heading below is a short outline of the information required. When submitting a PR, please delete this text and replace it with your own.

The CLA section can be deleted entirely.

Description

update the trimming function using rolling windows to create the mask around the signal {“fixes #{35}”}

If needed follow up with as much detail as required.

Type of PR

Documentation

The most important and tweakable part is threshold_value do play around with the value to get your desired trimming

CLA

To protect you, the project, and those who choose to use Mycroft technologies in systems they build, we ask all contributors to sign a Contributor License Agreement.

This agreement clarifies that you are granting a license to the Mycroft Project to freely use your work. Additionally, it establishes that you retain the ownership of your contributed code and intellectual property. As the owner, you are free to use your code in other work, obtain patents, or do anything else you choose with it.

If you haven't already signed the agreement and been added to our public Contributors repo then please head to https://mycroft.ai/cla to initiate the signing process.

amoljagadambe commented 3 years ago

I think you above statement is right NumPy and pandas are adding stress into the alpine version of python. meanwhile, I am also working on a different approach which is less heavy and computationally efficient

amoljagadambe commented 3 years ago

above approach also need some tweak in dockerfile

amoljagadambe commented 3 years ago

'ffmpeg -i {} -ab 160k -ac 2 -ar 44100 -vn {}.wav -y'.format( webm_file_name, path )

why are we using 2 channels while saving the file

krisgesling commented 3 years ago

Hey, glad that you're finding the project helpful and able to modify it to fit your use case.

The removal of all silence makes more sense now as it seems you're recording single words rather than whole sentences.

It feels like these changes would be well suited to be configuration options set in the docker-compose.yaml eg: