davidcarslaw / openair

Tools for air quality data analysis
https://davidcarslaw.github.io/openair/
GNU General Public License v2.0
304 stars 113 forks source link

Meaning of hexagon bin boundaries #260

Closed Pweidemueller closed 1 year ago

Pweidemueller commented 3 years ago

Hi there,

I'm plotting my data using the scatterPlot function with the hexbin method. It works very well! However, I can't find an explantion about the meaning of the numbers next to the hexagon legend (s. e.g. Fig. 17.2 https://bookdown.org/david_carslaw/openair/sec-scatterPlot.html).

I guess they mean upper and lower bounds of counts collected in the color? Is the lower number exluding or including? Looking at Fig 17.2: does the 286 and 417 next to the most red hexagon mean that this hexagon includes all counts LARGER than 286 up to and including 417?

On a different note: Is the possible to set the number of bins? The default seems to produce 16 differently coloured hexagons. Can I set this number?

jack-davison commented 1 year ago

Hello,

scatterPlot() uses {hexbin} to bin data into hexes for plotting. The lower number is inclusive and the upper number is exclusive. i.e., if the hex has 1 below it and 2 above it, that means that hex colour is associated with a single value. This can be simply demonstrated if we examine our data - we'd expect 2 observations where nox > 1000 and no2 > 150, and that's what we see.

The uppermost limit (e.g, 3712) is inclusive, however - it does represent the highest count in the data.

library(openair)

scatterPlot(mydata, method = "hexbin")


dplyr::filter(mydata, nox > 1000, no2 > 150)
#> # A tibble: 2 × 10
#>   date                   ws    wd   nox   no2    o3  pm10   so2    co  pm25
#>   <dttm>              <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
#> 1 1999-01-22 11:00:00 NA      220  1144   155     6   139  18.8  7      104
#> 2 1999-01-22 14:00:00  0.96   250  1075   169     5   116  20.8  7.15    87

Created on 2023-03-23 with reprex v2.0.2

The number of bins is hard-coded I believe, but nowadays you might benefit from packages like {ggplot2} for that sort of thing, which will give you a lot more control over the output:

library(ggplot2)

ggplot(openair::mydata, aes(nox, no2)) +
  geom_hex() +
  theme_bw() +
  theme(aspect.ratio = 1) +
  scale_fill_binned(type = "viridis", show.limits = TRUE,
                    breaks = c(0, 10, 20, 50, 100, 500, 2000, 4000)) +
  labs(y = openair::quickText("no2"),
       x = openair::quickText("nox"))
#> Warning: Removed 2438 rows containing non-finite values (`stat_binhex()`).

Created on 2023-03-23 with reprex v2.0.2