davidsjoberg / ggstream

A package to make streamplots
Other
173 stars 15 forks source link

y values far too high? #19

Open schignel opened 3 years ago

schignel commented 3 years ago

Thanks for the great package. I am using it to make a stream graph as an alternative to a stacked area chart. Everything looks good, except that for some reason the stream graph y values are far higher than the area chart.

image

image

Notice that the stream graph is about 1000 more than the stacked area chart, and yet they are using the same dataset. Also, is there a way to make it so the streamgraph does not continue to rise after the dotted marker line?

Here is my code for the stream graph: `library(ggstream)

stream <- ggplot(fp2, aes(x = Year, y = Area, fill = Station)) + geom_stream(type = "ridge",

color = "white", lwd = 0.05,

          alpha = 1,
          #sorting = "onset",
          bw = 1,
          ) +

scale_fill_manual(values = colorRampPalette(brewer.pal(17, "Accent"))(colourCount)) + ylab(label = "Building Footprint (m2)") + scale_x_continuous(limits = c(1950,2020)) + geom_vline(xintercept = 1998, linetype="dotted") + theme_minimal() stream`

DagHjermann commented 2 years ago

Also, thanks for the package!

We have the same problem though. We have data for 5 discrete dates. Reproducible example:

library(ggplot2)
library(ggstream)
library(dplyr)

testdata <- structure(
  list(
    SITE_CODE = c("CS", "CS", "CS", "CS", "CS", "HT", 
                  "HT", "HT", "HT", "HT", "JSB", "JSB", "JSB", "JSB", "JSB", "JV1", 
                  "JV1", "JV1", "JV1", "JV1"), 
    SAMPLE_DATE = structure(
      c(18187, 
        18218, 18248, 18279, 18310, 18187, 18218, 18248, 18279, 18310, 
        18187, 18218, 18248, 18279, 18310, 18187, 18218, 18248, 18279, 
        18310), class = "Date"), 
    WA_Avg = c(271, 210.2, 100.9, 1.4, 0, 
               130.7, 112.7, 46.4, 86.8, 0, 97.7, 87.9, 18.8, 74.5, 0, 36.1, 
               16, 8.1, 34.9, 0)), 
  class = c("tbl_df", "tbl", "data.frame"), 
  row.names = c(NA, -20L)
)

gg <- ggplot(testdata, aes(x = SAMPLE_DATE, y = WA_Avg, fill = SITE_CODE))

gg + 
  geom_stream(type = 'ridge') +
  geom_col(color = "black", width = 2) 

We plot both the streamplot as well as bars for the actual data. In the resulting plot, it is clear that the streamplot is way higher than the data: image

Increasing the bw parameter helps a bit but far from enough:

gg + 
  geom_stream(type = 'ridge', bw = 2) +
  geom_col(color = "black", width = 2) 

image What we had hoped for is something like below (made using some custom code): image We used loess() for smoothing instead of smooth.spline which ggstream appears to use, but smooth.spline is also be able to make a smooth from 5 dates.

ccshao commented 2 years ago

same issues here, any updates? Thanks!

DagHjermann commented 2 years ago

@ccshao If you want to try the approach I used above, check out this gist: https://gist.github.com/DagHjermann/e15423afc2204c8b217935134f237991

Please note that column names (SIDET_CODE, SAMPLE_DATE and WA_Avg) are hard-coded into the function codes. You must replace them with your actual variable names, or generalize the functions so you can supply your own variable names.

ccshao commented 2 years ago

@DagHjermann Thanks for sharing the codes. In my work the scenario is a little different as I am working with pseudotime, which is scaled from 0 to 1. I found the codes from https://stackoverflow.com/questions/13084998/streamgraphs-in-r is quite helpful. Notes there is randomness in the stream.