IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Extend and test heatmaps for the type of example shown #8579

Open rdstern opened 1 year ago

rdstern commented 1 year ago

This started with a query about the scatterplot dialog. We have geom_tile in the heatmap dialog. It is working well for simple examples. I wonder if it can cope with this one. Alternatively (or as well) can we use the general graph dialog?

@lilyclements and @MeSophie the scatterplot code now seems to add an odd geom_count to the code.
a) I think it should be deleted. Can you confirm? b) In the dialog it is called point plot as the title, in the top left. I suggest we change it to Scatter Plot to be consistent with the menu item, and because we are now offering more than a point plot - we also have jitter for example, and see below. c) I would like to consider adding Count as an additional option. So there are now 3 options, namely Point, Count and Jitter. They could be radio buttons on the left, because they are alternatives. Maybe there could be a group box round them with label Geom:? d) Are there any others we could add? I am particularly interested in tile for example? e) I found this graph in the Intro to Data Science book. @lilyclements how could we do it? Could this be an example of a sort of scatterplot with a tile as the geom?

image

I have found the data. It has numeric as the x, a factor as the y, and then a continuous variable (I think as the fill? If not, then what is it and how do we do it in ggplot2?

MeSophie commented 1 year ago

image @rdstern From my side I can't obtain the geom_count.

image Please can you give more explanation about this? The rdo that you are talking about are there be on the way like the one in Histogram dialog?

lilyclements commented 1 year ago

A quick google found the plot with some code to do it here. With that guide, the comments giving a more updated version of the code, and a few tweaks, we can get the data here -

measles <- readxl::read_excel("G:/My Drive/IDEMS/Data/Tycho_Level1.csv")

unique(measles$loc_type)

measles <- measles %>%
  filter(loc_type == "STATE") %>%
  filter(disease == "MEASLES")

measles <- measles %>%
  mutate(year = substr(epi_week, start = 1, stop = 4)) %>%
  mutate(year = as.numeric(year))

mdf <- measles %>%
  group_by(state, year) %>% 
  summarise(cases=if(all(is.na(cases))) NA else 
    sum(cases, na.rm=T)) %>%
  mutate(state = as.factor(state))

cols <- c(colorRampPalette(c("#e7f0fa", "#c9e2f6", "#95cbee", "#0099dc", "#4ab04a", "#ffd73e"))(10),
                              colorRampPalette(c("#eec73a", "#e29421", "#e29421", "#f05336", "#ce472e"), bias=2)(90))

ggplot(mdf, aes(y=reorder(state, cases), x=year, fill=cases)) +
  geom_tile(colour="white", #linewidth=2,
            width=.9, height=.9) + theme_minimal() +
  scale_fill_gradientn(colours=cols, limits=c(0, max(mdf$cases)),
                       breaks=seq(0, max(mdf$cases), by=floor(max(mdf$cases) / 4)),
                       na.value=rgb(246, 246, 246, max=255),
                       guide=guide_colourbar(ticks=T, nbin=50, barheight=.5, label=T, barwidth=10)) +
  scale_x_continuous(expand=c(0,0), breaks=seq(min(mdf$year), max(mdf$year), by=10)) +
  geom_segment(x=1963, xend=1963, y=0, yend=length(unique(mdf$state)), size=.9) +
  labs(x="", y="", fill="") +
  ggtitle("Measles") +
  theme(legend.position=c(.5, -.13),
        legend.direction="horizontal",
        legend.text=element_text(colour="grey20"),
        plot.margin=grid::unit(c(.5,.5,1.5,.5), "cm"),
        axis.text.y=element_text(size=6, family="Helvetica", hjust=1),
        axis.text.x=element_text(size=8),
        axis.ticks.y=element_blank(),
        panel.grid=element_blank(),
        title=element_text(hjust=-.07, face="bold", vjust=1, family="Helvetica"),
        text=element_text(family="URWHelvetica")) +
 annotate("text", label="Vaccine introduced", x=1963, y=length(unique(mdf$state)) + 1.5,
           vjust=1, hjust=0, size=I(3), family="Helvetica")

This also gives guides on getting the code (is it worth contacting Project Tycho on putting the data in R-Instat? They have quite a lot of interesting data sets here too. They seem keen on open data, but perhaps like their data to be downloaded from them directly so they can see their statistics on it).

I will email the data to @rdstern and @MeSophie to replicate this code, alongside the README with citations and LICENSE for the data.

My version - image

So, to get this we use geom_tile. The other code is just colours, labels, etc.

MeSophie commented 8 months ago

@rdstern for this issue can I add a check box for geom_tile on Heat Map Dialog?