davidcarslaw / openair

Tools for air quality data analysis
https://davidcarslaw.github.io/openair/
GNU General Public License v2.0
307 stars 113 forks source link

Combine `importLocal()` into the `import_network_worker()`; Refine `ratified` outputs #349

Closed jack-davison closed 1 year ago

jack-davison commented 1 year ago

This PR is a successor to https://github.com/davidcarslaw/openair/pull/347 which sets out to:

importLocal() changes

Changes to importLocal() aren't user-visible, beyond documentation being combined with the other UKAQ functions.

If users try to use data_type = "daqi" with importLocal(), they'll get an error.

library(openair)

importLocal(data_type = "hourly")
#> Warning: ℹ This data is associated with locally managed air quality network sites in
#>   England.
#> ! These sites are not part of the AURN national network, and therefore may not
#>   have the same level of quality control applied to them.
#> This warning is displayed once every 8 hours.
#> # A tibble: 8,760 × 7
#>    site                   code  date                  nox   no2    no  pm10
#>    <chr>                  <chr> <dttm>              <dbl> <dbl> <dbl> <dbl>
#>  1 Adur - Shoreham-by-Sea AD1   2018-01-01 00:00:00    NA    NA    NA    NA
#>  2 Adur - Shoreham-by-Sea AD1   2018-01-01 01:00:00    NA    NA    NA    NA
#>  3 Adur - Shoreham-by-Sea AD1   2018-01-01 02:00:00    NA    NA    NA    NA
#>  4 Adur - Shoreham-by-Sea AD1   2018-01-01 03:00:00    NA    NA    NA    NA
#>  5 Adur - Shoreham-by-Sea AD1   2018-01-01 04:00:00    NA    NA    NA    NA
#>  6 Adur - Shoreham-by-Sea AD1   2018-01-01 05:00:00    NA    NA    NA    NA
#>  7 Adur - Shoreham-by-Sea AD1   2018-01-01 06:00:00    NA    NA    NA    NA
#>  8 Adur - Shoreham-by-Sea AD1   2018-01-01 07:00:00    NA    NA    NA    NA
#>  9 Adur - Shoreham-by-Sea AD1   2018-01-01 08:00:00    NA    NA    NA    NA
#> 10 Adur - Shoreham-by-Sea AD1   2018-01-01 09:00:00    NA    NA    NA    NA
#> # ℹ 8,750 more rows

importLocal(data_type = "daqi")
#> Error in `import_network_worker()`:
#> ! `data_type` 'DAQI' is not available for locally managed networks

Created on 2023-06-02 with reprex v2.0.2

Ratified

Under the hood, ratified now works using joins rather than using loops. This makes it a lot less verbose and lets it work with to_narrow, which previously would give a warning and stop the function. When both ratified and to_narrow are TRUE, an additional qc column is appended to the output. ratified also now respects pollutant, meaning if pollutant = "nox", only nox_qc will be appended and not any other pollutants.

library(openair)

importAQE(ratified = T)
#> # A tibble: 8,646 × 9
#>    site          code  date                  nox   no2    no no_qc no2_qc nox_qc
#>    <chr>         <chr> <dttm>              <dbl> <dbl> <dbl> <lgl> <lgl>  <lgl> 
#>  1 York Heworth… YK13  2018-01-01 00:00:00 16.9  12.6   2.78 TRUE  TRUE   TRUE  
#>  2 York Heworth… YK13  2018-01-01 01:00:00 14.5  10.9   2.33 TRUE  TRUE   TRUE  
#>  3 York Heworth… YK13  2018-01-01 02:00:00 13.7  10.7   1.95 TRUE  TRUE   TRUE  
#>  4 York Heworth… YK13  2018-01-01 03:00:00 15.4  12.3   2.05 TRUE  TRUE   TRUE  
#>  5 York Heworth… YK13  2018-01-01 04:00:00 13.0  10.3   1.78 TRUE  TRUE   TRUE  
#>  6 York Heworth… YK13  2018-01-01 05:00:00  9.60  7.37  1.46 TRUE  TRUE   TRUE  
#>  7 York Heworth… YK13  2018-01-01 06:00:00 11.0   8.96  1.34 TRUE  TRUE   TRUE  
#>  8 York Heworth… YK13  2018-01-01 07:00:00 13.8  11.4   1.55 TRUE  TRUE   TRUE  
#>  9 York Heworth… YK13  2018-01-01 08:00:00 16.7  13.8   1.94 TRUE  TRUE   TRUE  
#> 10 York Heworth… YK13  2018-01-01 09:00:00 24.1  19.5   3.01 TRUE  TRUE   TRUE  
#> # ℹ 8,636 more rows

importAQE(ratified = T, pollutant = "nox")
#> # A tibble: 8,646 × 5
#>    date                  nox site               code  nox_qc
#>    <dttm>              <dbl> <chr>              <chr> <lgl> 
#>  1 2018-01-01 00:00:00 16.9  York Heworth Green YK13  TRUE  
#>  2 2018-01-01 01:00:00 14.5  York Heworth Green YK13  TRUE  
#>  3 2018-01-01 02:00:00 13.7  York Heworth Green YK13  TRUE  
#>  4 2018-01-01 03:00:00 15.4  York Heworth Green YK13  TRUE  
#>  5 2018-01-01 04:00:00 13.0  York Heworth Green YK13  TRUE  
#>  6 2018-01-01 05:00:00  9.60 York Heworth Green YK13  TRUE  
#>  7 2018-01-01 06:00:00 11.0  York Heworth Green YK13  TRUE  
#>  8 2018-01-01 07:00:00 13.8  York Heworth Green YK13  TRUE  
#>  9 2018-01-01 08:00:00 16.7  York Heworth Green YK13  TRUE  
#> 10 2018-01-01 09:00:00 24.1  York Heworth Green YK13  TRUE  
#> # ℹ 8,636 more rows

importAQE(ratified = T, to_narrow = T)
#> # A tibble: 25,938 × 6
#>    date                site               code  pollutant value qc   
#>    <dttm>              <chr>              <chr> <chr>     <dbl> <lgl>
#>  1 2018-01-01 00:00:00 York Heworth Green YK13  no         2.78 TRUE 
#>  2 2018-01-01 01:00:00 York Heworth Green YK13  no         2.33 TRUE 
#>  3 2018-01-01 02:00:00 York Heworth Green YK13  no         1.95 TRUE 
#>  4 2018-01-01 03:00:00 York Heworth Green YK13  no         2.05 TRUE 
#>  5 2018-01-01 04:00:00 York Heworth Green YK13  no         1.78 TRUE 
#>  6 2018-01-01 05:00:00 York Heworth Green YK13  no         1.46 TRUE 
#>  7 2018-01-01 06:00:00 York Heworth Green YK13  no         1.34 TRUE 
#>  8 2018-01-01 07:00:00 York Heworth Green YK13  no         1.55 TRUE 
#>  9 2018-01-01 08:00:00 York Heworth Green YK13  no         1.94 TRUE 
#> 10 2018-01-01 09:00:00 York Heworth Green YK13  no         3.01 TRUE 
#> # ℹ 25,928 more rows

importAQE(ratified = T, to_narrow = T, pollutant = "nox")
#> # A tibble: 8,646 × 6
#>    date                site               code  pollutant value qc   
#>    <dttm>              <chr>              <chr> <chr>     <dbl> <lgl>
#>  1 2018-01-01 00:00:00 York Heworth Green YK13  nox       16.9  TRUE 
#>  2 2018-01-01 01:00:00 York Heworth Green YK13  nox       14.5  TRUE 
#>  3 2018-01-01 02:00:00 York Heworth Green YK13  nox       13.7  TRUE 
#>  4 2018-01-01 03:00:00 York Heworth Green YK13  nox       15.4  TRUE 
#>  5 2018-01-01 04:00:00 York Heworth Green YK13  nox       13.0  TRUE 
#>  6 2018-01-01 05:00:00 York Heworth Green YK13  nox        9.60 TRUE 
#>  7 2018-01-01 06:00:00 York Heworth Green YK13  nox       11.0  TRUE 
#>  8 2018-01-01 07:00:00 York Heworth Green YK13  nox       13.8  TRUE 
#>  9 2018-01-01 08:00:00 York Heworth Green YK13  nox       16.7  TRUE 
#> 10 2018-01-01 09:00:00 York Heworth Green YK13  nox       24.1  TRUE 
#> # ℹ 8,636 more rows

Created on 2023-06-02 with reprex v2.0.2

Other Refinements

Attention has been paid to give nicer error/warning messages. These are powered by {cli}, which is a free dependency as {dplyr}, {tidyr}, etc. already import it.

image

Future

NB: This is not in the PR, and is all looking forward.

Looking at R/importUKAQ.R, all of the heavy lifting is now done outside of the UK import*() functions. The only real difference between them is they pass different source arguments to the helper function. Therefore, in future, we could supersede the import*() functions and export the helper as a formal importUKAQ() which has its own "source" argument, e.g.,

importUKAQ(site = "my1", year = 20220, source = "aurn")

This framework could then be extended to, for example, import sites from different UKAQ networks:

importUKAQ(
  site = c("my1", "yk13"),
  year = 2020,
  source = c("aurn", "aqe")
)

Combined with importMeta(), you could get everything in the UK incredibly easily:

meta <- importMeta(source = c("aurn", "aqe"))

importUKAQ(
  site = meta$code,
  year = 2020,
  source = meta$source
)
davidcarslaw commented 1 year ago

This looks really useful, thanks Jack - especially the use of ratified data information in different data formats / shapes