Following the peer feedback and suggestions on how to improve the report, we have implemented the following improvements on the original report:
Reproducibility
The code needed to install the spotifyr package was provided so that readers would not have issue with reproducing the report.
Data description
There was no data limitation mentioned in the previous report, therefore some data limitations were added.
The data summary before was quite confusing. Interactive tables for the data description and data summary are added to make it easier for readers to understand.
Overview of the data is done using visdat package.
Analysis
To broaden the scope of the analysis, we came up with two additional analysis questions that could enhance the previous analysis, which are as follows:
Explore the music characteristics over time - is the music characteristic changing?
Exploring how the music characteristics change over time could enhance the primary analysis of audio features. This question is a good addition to the scope of the analysis as rather than just looking at the different audio features, we are also looking at how those audio features are changing over time, therefore broadening the analysis. Music trend is evolving along the years, thus this question aims to analyse whether the music characteristics are also changing.
What exactly makes artists stand out even when there are artists doing the same kind of music? What is the Unique Selling Point (USP) of a few particular selected artists?
This helps enhance the scope of the primary analysis and broadens our understanding of the relations between popularity and audio features.
As mentioned in the previous issues I posted, the method on how to find the top artists based on track_popularity does not really make sense as the number of observations was counted instead of looking at the scores of track_popularity. Therefore, the average values of track_popularity are used as it would be more reasonable.
For the top artists who created the most songs, the duplicated track_name was not considered - leading into a higher number of songs per artist and an inaccurate result. Thus, the code was modified by adding n_distinct to account for only the unique values.
Some changes in terms of the visualisation were made.
Following the addition of the smooth curve on the correlation plot, smooth curve was also added to the analysis of “Music Characteristics Overtime” to provide a clearer overview.
Conclusion
Additional findings obtained from the new questions were added alongside with the limitations.
We wanted to add interactive plots on the report by using the plotly package. Unfortunately, it always ended up crashing which might be due to the big data. Therefore, we decided to not include interactive plots in our report.
Following the peer feedback and suggestions on how to improve the report, we have implemented the following improvements on the original report:
Reproducibility
spotifyr
package was provided so that readers would not have issue with reproducing the report.Data description
visdat
package.Analysis
To broaden the scope of the analysis, we came up with two additional analysis questions that could enhance the previous analysis, which are as follows:
As mentioned in the previous issues I posted, the method on how to find the top artists based on
track_popularity
does not really make sense as the number of observations was counted instead of looking at the scores oftrack_popularity
. Therefore, the average values oftrack_popularity
are used as it would be more reasonable.For the top artists who created the most songs, the duplicated
track_name
was not considered - leading into a higher number of songs per artist and an inaccurate result. Thus, the code was modified by addingn_distinct
to account for only the unique values.Some changes in terms of the visualisation were made.
Following the addition of the smooth curve on the correlation plot, smooth curve was also added to the analysis of “Music Characteristics Overtime” to provide a clearer overview.
Conclusion
We wanted to add interactive plots on the report by using the
plotly
package. Unfortunately, it always ended up crashing which might be due to the big data. Therefore, we decided to not include interactive plots in our report.