Open EmilySheehan0012 opened 4 years ago
As Emily said, one of the changes that we made is revising interpretation of correlation made by previous group. Since we learn from lecture 5 that correlation does not necessary means causality.
In addition to several revisions/corrections, we also try to make a side by side plot comparing bushfire trend and temperature as well as rainfall, using scaling tools (ie log and square root) which were introduced during last tutorial.
First, the analysis of the original group is really profound. It used a lot of excellent visualization tools which made the report is extremely wonderful. Our group also spent a long time finding out what was not covered by the original group, or what was not helpful to the topic, and then we reconstructed and reproduced the exploratory data analysis.
I will focus on talking about two new plots created by myself, one is a heat map displaying the daily average temperature for each city, the other is a correlation plot which outlines the correlation between climactic conditions and bushfires.
As Emily mentioned, after reconstructing the data analysis, our group needs to use visualization tools to show the impact of global warming on temperature and radiation. Although the original group uses the dygraph function to make it interactive for displaying the temperature change trend, our group thinks that temperature data only contains seven cities, so it may cause people's misunderstanding (mistaking it as the whole trend of temperature change in Australia). Therefore, I use heat map to express the specific process. First is to use geom_tile() to confirm the variable size in the center of the tile, Then use scale_fill_gradientn() to fill the color in each tile, and use color brightness to express the temperature. In this way, we can have a simple concept of the overall temperature change trend of the seven cities in Australia from 1920 to 2020, the average temperature of each city, and the speed of temperature change between cities.
Next, our group has changed the correlation analysis of the last part of the original group. The application of GGally package is good, but I think that there is no obvious trend in scatter plots and line plots generated by ggpairs. In other words, it is meaningless. Only the generated correlation coefficient in the figures can be used for analysis. Therefore, our group decided to replace ggpairs with corrplot function.
First, corrplot also contains correlation coefficient. Meanwhile, we deleted the useless scatter plots and line plots to make the display of figure cleaner. Secondly, I added many elements in corrplot, including font format, font size, font color, figure margin, but the most prominent change was that I filled two contrasting colors, red and blue, in the background of each square box. Red means positive correlation, blue means negative correlation. Because when people recognize a new figure, the first feeling is the most important, and compared with numbers, color is easier to catch people's eyes and attract people's attention.
In the last part of correlation, we also changed the original group conclusion. The original group believes that there is a slightly small correlation between temperature and bushfires, which needs for more research. However, our group believes that according to the exploratory existing data analysis, the correlation coefficient of temperature and bushfires is lower than 0.3, which means that there is almost no correlation between temperature and bushfires in the past 20 years. Therefore, we modify the conclusion of original group.
The above two plots include, but are not limited to, the modifications and reproductions I made to the original goanna group exploratory data analysis. Other modifications (such as making the figure more interactive or deleting some codes) are not clearly stated in this issue due to space reasons.
The new goanna exploratory data analysis, which is generated through the efforts of our group, is more closely integrated and has a hierarchical logical structure. Although some original fantasy visualizations is deleted, we have also added plenty of figures that will make the theme and conclusion clearer, and this is our group's persistent ideas that the accurate display of the data is more important than the beautiful presentation of the data.
We have changed the structure of the analysis. The analysis is now structured as follows:
Additionally, we have re-worded the introduction, data analysis, limitations, analysis and conclusion. We had added some plots and deleted the dygraphs to make the visualisations easier to understand.
My teammates will comment on the plots we have added and changed below.