Closed Robinlovelace closed 3 years ago
Taking an example from waste management, the R code below shows how to get data into R:
# Aim: demonstrate how to get data from UK gov sources
library(tidyverse)
# First stage: search - I searched on Google for:
# https://www.google.com/search?q=recycling+points+leeds+geojson
# That took me here:
# https://datamillnorth.org/publisher/leedscitycouncil?format=geojsonhttps://datamillnorth.org/publisher/leedscitycouncil?format=geojson
# That linked me to:
# https://datamillnorth.org/dataset/bring-sites
u = "https://datamillnorth.org/download/bring-sites/97bd60ae-ced3-4ddc-996e-f1ebe5d21136/tonnage%20to%20Sept%2020.csv"
waste_sites = read_csv(u)
waste_sites
dim(waste_sites)
skimr::skim(waste_sites)
waste_sites_unique = waste_sites %>%
# remove missing values: data cleaning
filter(! is.na(Longitude)) %>%
group_by(`Site Name`, Longitude, Latitude) %>%
summarise(tonnes_glass = sum(`Apr 20 Glass Tonnage Kg`, `May 20 Glass Tonnage Kg`))
waste_sites_sf = sf::st_as_sf(waste_sites_unique, coords = c("Longitude", "Latitude"), crs = 4326)
mapview::mapview(waste_sites_sf)
# remove sites that are miles away
pct_regions = pct::pct_regions
waste_sites_clean = waste_sites_sf[pct_regions, ]
library(tmap)
tm_shape(waste_sites_clean) +
tm_dots(size = "tonnes_glass")
See result of running that code below.
Name | waste_sites |
Number of rows | 403 |
Number of columns | 59 |
_______________________ | |
Column type frequency: | |
character | 14 |
logical | 33 |
numeric | 12 |
________________________ | |
Group variables | None |
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Site Name | 0 | 1.00 | 5 | 56 | 0 | 332 | 0 |
Address | 0 | 1.00 | 6 | 56 | 0 | 380 | 0 |
Post Code | 1 | 1.00 | 7 | 8 | 0 | 380 | 0 |
Location Type | 2 | 1.00 | 3 | 20 | 0 | 24 | 0 |
Ward | 3 | 0.99 | 6 | 29 | 0 | 41 | 0 |
Sticker | 153 | 0.62 | 1 | 1 | 0 | 2 | 0 |
Postcode area | 10 | 0.98 | 3 | 4 | 0 | 30 | 0 |
Site Number | 39 | 0.90 | 1 | 7 | 0 | 352 | 0 |
UPRN | 17 | 0.96 | 3 | 8 | 0 | 378 | 0 |
Recyclables-Mixed Glass | 24 | 0.94 | 1 | 1 | 0 | 2 | 0 |
Recyclables-Paper | 350 | 0.13 | 1 | 1 | 0 | 1 | 0 |
Recyclables-Textiles | 349 | 0.13 | 1 | 1 | 0 | 1 | 0 |
Recyclables Small Electrical | 383 | 0.05 | 1 | 1 | 0 | 1 | 0 |
Type | 8 | 0.98 | 5 | 64 | 0 | 23 | 0 |
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
Non LCC Banks | 403 | 0 | NaN | : |
Cleaning | 403 | 0 | NaN | : |
X29 | 403 | 0 | NaN | : |
X30 | 403 | 0 | NaN | : |
X31 | 403 | 0 | NaN | : |
X32 | 403 | 0 | NaN | : |
X33 | 403 | 0 | NaN | : |
X34 | 403 | 0 | NaN | : |
X35 | 403 | 0 | NaN | : |
X36 | 403 | 0 | NaN | : |
X37 | 403 | 0 | NaN | : |
X38 | 403 | 0 | NaN | : |
X39 | 403 | 0 | NaN | : |
X40 | 403 | 0 | NaN | : |
X41 | 403 | 0 | NaN | : |
X42 | 403 | 0 | NaN | : |
X43 | 403 | 0 | NaN | : |
X44 | 403 | 0 | NaN | : |
X45 | 403 | 0 | NaN | : |
X46 | 403 | 0 | NaN | : |
X47 | 403 | 0 | NaN | : |
X48 | 403 | 0 | NaN | : |
X49 | 403 | 0 | NaN | : |
X50 | 403 | 0 | NaN | : |
X51 | 403 | 0 | NaN | : |
X52 | 403 | 0 | NaN | : |
X53 | 403 | 0 | NaN | : |
X54 | 403 | 0 | NaN | : |
X55 | 403 | 0 | NaN | : |
X56 | 403 | 0 | NaN | : |
X57 | 403 | 0 | NaN | : |
X58 | 403 | 0 | NaN | : |
X59 | 403 | 0 | NaN | : |
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Latitude | 4 | 0.99 | 53.81 | 0.05 | 53.72 | 53.78 | 53.81 | 53.83 | 53.94 | ▃▇▇▂▂ |
Longitude | 4 | 0.99 | -1.52 | 0.29 | -1.73 | -1.61 | -1.55 | -1.48 | 1.59 | ▇▁▁▁▁ |
y | 4 | 0.99 | 435026.93 | 5276.03 | 425053.00 | 431561.50 | 434782.00 | 437156.50 | 449157.00 | ▅▇▇▂▂ |
x | 4 | 0.99 | 430223.87 | 6230.77 | 417833.00 | 426014.50 | 429440.00 | 434559.50 | 445532.00 | ▃▇▇▃▃ |
Tally of Sites | 33 | 0.92 | 1.01 | 0.16 | 1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▇▁▁▁▁ |
No of Glass Banks | 42 | 0.90 | 1.89 | 1.66 | 0.00 | 1.00 | 1.00 | 3.00 | 12.00 | ▇▂▁▁▁ |
Apr 20 Glass Tonnage Kg | 145 | 0.64 | 3886.02 | 6212.30 | 0.00 | 528.75 | 1562.00 | 3902.25 | 40130.00 | ▇▁▁▁▁ |
May 20 Glass Tonnage Kg | 145 | 0.64 | 4918.67 | 8277.63 | 0.00 | 619.50 | 1714.50 | 4443.25 | 55487.00 | ▇▁▁▁▁ |
Jun 20 Glass Tonnage Kg | 133 | 0.67 | 5172.92 | 8200.52 | 0.00 | 654.75 | 1898.50 | 5113.75 | 51871.00 | ▇▁▁▁▁ |
Jul 20 Glass Tonnage Kg | 133 | 0.67 | 4307.68 | 7254.31 | 0.00 | 628.64 | 1512.59 | 4572.34 | 50283.99 | ▇▁▁▁▁ |
Aug 20 Glass Tonnage Kg | 151 | 0.63 | 4014.84 | 6553.86 | 62.64 | 640.01 | 1444.75 | 3772.38 | 45626.10 | ▇▁▁▁▁ |
Sep 20 Glass Tonnage Kg | 97 | 0.76 | 3172.18 | 5501.54 | 0.00 | 506.72 | 1076.65 | 2512.98 | 37572.20 | ▇▁▁▁▁ |
Many sites provide government data, primarily https://data.gov.uk/
Transport data can be found here: https://data.gov.uk/search?filters%5Btopic%5D=Transport
Geographic data can be found here: https://data.gov.uk/search?filters%5Btopic%5D=Mapping
and here: https://geoportal.statistics.gov.uk/
See here for more info on places to get transport data: https://github.com/ITSLeeds/TDS/blob/master/coursework-template.md#datasets