data-mining-group-project / mood-classifier

Spotify Mood Classifier based on playlist names (Sad or Happy)
1 stars 3 forks source link

Decide what variable (features) to use (correlation analysis) #4

Open Guiwald opened 5 years ago

Guiwald commented 5 years ago

We need to decide what song features to keep to work with (for example energy, liveness, etc.).

AnneliseCanesso commented 5 years ago

This is a list of Features available in Spotify

danceability , energy , loudness , mode , speechiness , acousticness , instrumentalness, liveness, valence, tempo.

Please search for information about each one online or in Spotify.

Lizard0011 commented 5 years ago

Description of each features on the link bellow (scroll down to Audio Features Object) https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/

Guiwald commented 5 years ago

What about using also the duration as a feature?

Guiwald commented 5 years ago

Here are the features I'm getting from #10:

danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms

AnneliseCanesso commented 5 years ago

I personally would not add in duration, does it interfere with the mood? But this guy is using - https://www.kaggle.com/abecc1995/eda-dec-tree-rand-forest-classifiers -

Guiwald commented 5 years ago

I personally would not add in duration, does it interfere with the mood? But this guy is using - https://www.kaggle.com/abecc1995/eda-dec-tree-rand-forest-classifiers -

I would say, that is subjective, like the other variables. We need to keep in mind that this is not each individual variable alone that classify a song, but the mix of them. So we should not think like "Does this unique feature alone makes the song happy or sad", but "does tempo fast, with low speechiness, and a long duration, etc., makes the song sad or happy?".

This is the reason why the variables need to be cleaned based on their correlation. If 2 variable are completely correlated, no information is added to help to classify by having one, or two: they "evolve" exactly in the same way depending on the mood. What is interesting is when the variables are independent, because that means they would move differently independently, but gives new "insight" when taken together, when we add their dimensions

Guiwald commented 5 years ago

Package "caret" function findCorrelation() Example of application: https://stackoverflow.com/a/30911235/9808742

Man of caret, specifically about findCorrelation function: http://topepo.github.io/caret/pre-processing.html#identifying-correlated-predictors