dawg / models

Machine learning models for use in Vusic
8 stars 0 forks source link

Data Augmentation #27

Closed aodonnell closed 5 years ago

aodonnell commented 5 years ago

Expected Behaviour

We need to make sure that we can load data quick enough to avoid a bottleneck. This needs to be done ASAP since I would like us to start training next week

Current Behaviour

With preservation of both phase and magnitude information, our samples are now HUGE. This means that loading and unloading a single sample is an expensive task.

Suggested Fix

Someone could research if it's common to split an audio file into multiple smaller samples or if there are other industry techniques to throttle frequency information. Another thing we can consider is ditching some frequency. Looking at the logarithmic spectral density below, the majority of the signal is contained within roughly half of the spectrum we compute the stft for. This will sacrifice some timbre information contained in those upper harmonics but I don't really think it will affect the overall sound enough to make it noticeable.

screen shot 2019-01-31 at 18 32 01

aodonnell commented 5 years ago

Looking into it a bit, it seems we can eliminate more that 3/4 of our frequency bins and still preserve enough meaningful information about the track vocals. This is HUGE since that significantly reduces the input dimensionality of our BiRNN Encoder 😄 -> 🕶 -> 😎 . I've already implemented this in https://github.com/dawg/models/tree/feature/separation-optimizers so we can close this issue once it is merged.