facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.23k stars 2.03k forks source link

Add energy filter to AudioDataset #314

Closed piwell closed 7 months ago

piwell commented 9 months ago

We want to avoid using segments that contains too much silence. This PR exposes an energy filter parameter that can be set to retry audio fetching if the segment energy is below the provided threshold.

facebook-github-bot commented 9 months ago

Hi @piwell!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

piwell commented 7 months ago

the current setup can crash if there are a few consecutive files with a lot of silence, sounds like this is not exactly the behavior we wish to achieve? in such case the user will need to go back and filter the data manually in any case, right?

That is true. That is why we used the already-built retry mechanic since this is true today. If you are unlucky and can't read a few files in a row you will crash. Our thinking was that if this is a problem and you have a lot of silent files you can increase max_read_retry. All this is to avoid accidentally getting into a never-ending loop.

i agree that energy threshold is useful, but i think that it makes more sense as an offline preprocessing step. lmk.

Valid point. The reason why we went with this approach and not preprocessing is that our files contain some music and some silence. Because of this, you don't know if you have silence or not before the random crop.

felixkreuk commented 7 months ago

hey @piwell, thanks again for the contribution and the discussion. i do agree with the need to filter out silences. i see it more as an offline pre-processing step. the reason being that a few silences in a raw will crash the code, so the issue of silences will not be solved by this modification of get_item. in such case the user will need to still filter out files with silences manually. so i propose to offload it to an external script that filters out samples from a data json file.

piwell commented 7 months ago

Make sense. Thank you for taking a look at it.