How to create a text and audio dataset

archinetai / audio-data-pytorch

A collection of useful audio datasets and transforms for PyTorch.

MIT License

129 stars 22 forks source link

How to create a text and audio dataset #7

Open AI-Guru opened 1 year ago

AI-Guru commented 1 year ago

Hi!

First and foremost: congratulations on this fine collection of repositories! I am slowly working my way through them and I am amazed by how easy and effective your work is.

I will soon start some work on conditional audio generation. What would be a good starting point for creating something like a WAVDataset that would yield audio and text? Would it be the best way to just extend WAVDataset?

Best, Tristan

flavioschneider commented 1 year ago

Hi @AI-Guru, thanks a lot!

A subclass of WAVDataset with extra text metadata would be a good starting option. I personally used a WebDataset (with the custom AudioWebDataset) which basically loads a set of tar files with numbered pairs of wav/json. WebDatasets work well with a lot of data, but it's a bit more involved to start with.