edsp2016 / ITIMA_Zafrin_Harrison

Identifying Trends in Mixed Audio - EDSP16 Project
1 stars 0 forks source link

loudness in lead vocal track #8

Open heatherkoff opened 8 years ago

heatherkoff commented 8 years ago

Why is it too complicated to manually split 2 songs by time points that divide versus from bridge and choruses? For speech files we would do this with a simple script in Praat... Perhaps there is a complication that I do not know.

heatherkoff commented 8 years ago

On another note (and as my comment assignment for this week), I'm struggling to understand the spectral graphics you have displayed in your readme file. Is the 3s slice you are showing of Ariana Grande of her saying "one last time," or of some other connected sung speech? If so, why can't I see resonant frequencies varying-- there seem to be light bands flat across the spectrogram, suggesting that this is a sustained sound. I am used to hearing of "spectral centroids" in reference to speech sounds that involve frication (s/z, f/v, sh/zh), as there would be a band of high frequency noise in such sounds-- but what is the importance of a centroid in connected (sung) speech?

This question may not be important to the analyses that you will do with your data, i.e., I'm not sure if you have a hypothesis that spectral mean will have an impact in mix quality. I'm just curious about the possible importance of this metric to your future analyses.

~Heather

bombsandbottles commented 8 years ago

Hi Heather,

The graphic in the ReadMe is of a STFT, so on the y axis is frequency and on the x axis is time. A fourier transform is taken every 2048 samples so you get a moving picture of frequency across time. The big white vertical spaces are silence, and the darker the shade of pink, the larger the magnitude of that frequency value. So youll see that theres a darker shade of pink that hovers around 200hz across time. This is the fundamental frequency of her voice. So the graph is actually a 3 minute long picture, not 3 seconds.

In my future analyses, probably not in this class time, the spectral centroid could be useful in instrument recognition. For example, it's possible that lead instruments all share similar loudness values and spectral centroids. If you could predict an instrument was a lead given its spectral centroid values, you would know where to put it in the mix and adjust that instruments loudness (just one example off the top of my head). It could also be useful when thinking about instruments in relationship to each other. It's possible that the lead instruments spectral centroid is never "masked" by another instruments frequency content in a mix down, giving engineers insight into automatic equalization tools. This way a system would know to always make sure the lead instruments spectral centroid values were free of any adverse masking effects.