ZIYAOWANG123 / my_project

ETC5543 Project
MIT License
0 stars 0 forks source link

Hexagon tiling algorithm #4

Open mitchelloharawild opened 1 year ago

mitchelloharawild commented 1 year ago

INPUT: SF geometry, OUTPUT: SF hexagon tiled geometry

Here's some code I've just written to experiment with tiling polygons with a specific number of hexagons. I think this will help you with writing some code/algorithm to produce better hexagon maps with a specific number of hexagons.

Unlike geogrid::calculate_grid(), this method deterministically chooses hexagons from a tiling based on their overlapping area. This should give a better and more consistent tiling instead of randomly sampling them.

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(ozmaps)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Map of Australian states

ozmap_states
#> Simple feature collection with 9 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 105.5507 ymin: -43.63203 xmax: 167.9969 ymax: -9.229287
#> Geodetic CRS:  GDA94
#> # A tibble: 9 × 2
#>   NAME                                                                  geometry
#> * <chr>                                                       <MULTIPOLYGON [°]>
#> 1 New South Wales              (((150.7016 -35.12286, 150.6611 -35.11782, 150.6…
#> 2 Victoria                     (((146.6196 -38.70196, 146.6721 -38.70259, 146.6…
#> 3 Queensland                   (((148.8473 -20.3457, 148.8722 -20.37575, 148.85…
#> 4 South Australia              (((137.3481 -34.48242, 137.3749 -34.46885, 137.3…
#> 5 Western Australia            (((126.3868 -14.01168, 126.3625 -13.98264, 126.3…
#> 6 Tasmania                     (((147.8397 -40.29844, 147.8902 -40.30258, 147.8…
#> 7 Northern Territory           (((136.3669 -13.84237, 136.3339 -13.83922, 136.3…
#> 8 Australian Capital Territory (((149.2317 -35.222, 149.2346 -35.24047, 149.271…
#> 9 Other Territories            (((167.9333 -29.05421, 167.9188 -29.0344, 167.93…
plot(ozmap_states)

Tile the map with hexagons

plot(st_make_grid(ozmap_states, square = FALSE), add = TRUE)
#> Error in polypath(p_bind(x[[i]]), border = border[i], lty = lty[i], lwd = lwd[i], : plot.new has not been called yet

Not enough hexagons, how do we set the appropriate grid dimensions?

Less hexagons?

plot(st_intersection(ozmap_states, st_make_grid(ozmap_states,  cellsize = 10, square = FALSE)))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

More hexagons?

plot(st_intersection(ozmap_states, st_make_grid(ozmap_states,  cellsize = 1, square = FALSE)))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

Find the number of hexagons that overlap Australia, and set it to optimise it to the desired number of hexagons.

hex_grid <- st_make_grid(ozmap_states,  cellsize = 3, square = FALSE)
hex_intersect <- lengths(st_intersects(hex_grid, st_geometry(ozmap_states))) > 0
hex_grid <- hex_grid[hex_intersect]
plot(st_union(ozmap_states, hex_grid), pal = sf.colors(alpha = 0))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

126 hexagons cover australia with cellsize = 3

sum(hex_intersect)
#> [1] 126

We need more, so decrease cellsize slightly

hex_grid <- st_make_grid(ozmap_states,  cellsize = 2.7, square = FALSE)
hex_intersect <- lengths(st_intersects(hex_grid, st_geometry(ozmap_states))) > 0
hex_grid <- hex_grid[hex_intersect]
plot(st_union(ozmap_states, hex_grid), pal = sf.colors(alpha = 0))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

153 hexagons cover australia with cellsize = 2.7

sum(hex_intersect)
#> [1] 153

Now pick the best 151 hexagons based on maximising their overlapping area with Australia

# Compute area overlapping hexagon with geometry
hex_grid <- st_sf(hex_grid, hex_id = seq_along(hex_grid))
hex_best <- st_intersection(hex_grid, ozmap_states) %>% 
  group_by(hex_id) %>% 
  summarise(area = sum(st_area(hex_grid))) %>% 
  top_n(151, wt = area) %>% 
  pull(hex_id)
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
hex_grid <- filter(hex_grid, hex_id %in% !!hex_best)
plot(st_union(ozmap_states, hex_grid$hex_grid), pal = sf.colors(alpha = 0))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

plot(hex_grid)

A better result is probably achieved with a smaller grid:

hex_grid <- st_make_grid(ozmap_states,  cellsize = 2.3, square = FALSE)
hex_intersect <- lengths(st_intersects(hex_grid, st_geometry(ozmap_states))) > 0
hex_grid <- hex_grid[hex_intersect]
plot(st_union(ozmap_states, hex_grid), pal = sf.colors(alpha = 0))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

# 200 hexagons cover australia with cellsize = 2.3
sum(hex_intersect)
#> [1] 200

# Compute area overlapping hexagon with geometry
hex_grid <- st_sf(hex_grid, hex_id = seq_along(hex_grid))
hex_best <- st_intersection(hex_grid, ozmap_states) %>% 
  group_by(hex_id) %>% 
  summarise(area = sum(st_area(hex_grid))) %>% 
  top_n(151, wt = area) %>% 
  pull(hex_id)
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
hex_grid <- filter(hex_grid, hex_id %in% !!hex_best)
plot(st_union(ozmap_states, hex_grid$hex_grid), pal = sf.colors(alpha = 0))
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

Next steps:

  1. Adapt above code to automatically select the best number of hexagons in the grid (maximise the overlapping area between the 151 hexagons and the australian geometry)
  2. Try this method on a cartogram map distorted to make each electorate have equal size (or each state weighted by its number of electorates)
  3. Try allocating the hexagons to their electorate / state

Created on 2022-09-08 by the reprex package (v2.0.1)

ZIYAOWANG123 commented 1 year ago

This is a very nice example, thanks for sharing. I will apply this workflow when modifying the current geogrid::calculate_grid function and will inform you of any updates.

ZIYAOWANG123 commented 1 year ago

The draft function has been pushed to GitHub as calculate_grid.R.

The function is not complete, and debugging is required. Regarding the issue of obtaining the best number of hexagons automatically, I think maybe letting users define an id variable will be one solution since the number of hexagons will be altered by the user's will.

Meanwhile, I will attempt to achieve your suggestions: determining the number of hexagons by maximizing the overlapping area between regions.

mitchelloharawild commented 1 year ago

Well, id used here is simply the row number from the <sf> dataset. This is because each row corresponds to a geometry (say a statistical area or electorate). So I don't think the user needs to define the id variable, we can generate it and only need it as an intermediate linking variable for computational purposes.

It should however be possible to provide a column of weights, indicating that each region may warrant more than one hexagon (say to weight by population).

ZIYAOWANG123 commented 1 year ago

Sure, I will take the id off and replace it with weight to amplify the importance of some regions if needed. Yet, I wonder how will the weight emphasize the effect of a particular area.

  1. Should I add it when selecting the area with maximized intersection area (via top_n)?

  2. Or, firstly calculate the best_hex (the best number of hexagons used for the task), then add this weight to update the calculation of the number of hexagons (i.e. update best_hex).

  3. Does the weight need to be scaled before inputting in the process? What if sometimes the weight is a categorical variable (specifically an ordinal one such as "good > moderate > bad"), do we convert it for the users or provide an "error/warning" to them?

mitchelloharawild commented 1 year ago

Weight should just change the number of hexagons (rather than 1 per row, use the sum of weight). This is because the same weight variable will be used when distorting the map with the cartogram, so the exact association between each hexagon and their geometry doesn't matter until later (the allocation step).

  1. Distort the map with a cartogram (distortion is possibly weighted, for electorates weights=1)
  2. Tile the distorted map with hexagons (number of hexagons is equal to the sum of the weights, so for electorates it is the sum of many 1s giving the total number of electorates)
  3. Allocate hexagons to regions (combine 1 or more hexagons into single geometries and join that with the original data, effectively replacing the original geometry with hexagon(s)).
mitchelloharawild commented 1 year ago

Answering your questions:

  1. The selection of maximal intersecting area is independent of weight. Since the distortion and number of hexagons are controlled by weight, the optimisation of hexagons can be done as usual (just on different hexagons due to the effect of weight).
  2. No, not a weighted calculation.
  3. Yes. Weights need to be numeric, you can raise an error if they are not. Note an error is appropriate here, since we can't do anything appropriate with non-numeric weights.
ZIYAOWANG123 commented 1 year ago

The cellsize function is to calculate the proper cell size used to create the hexagon grid, it contains two parts:

Please correct or add to my comment!

mitchelloharawild commented 1 year ago

Hi @ZIYAOWANG123, how are you going with this?

ZIYAOWANG123 commented 1 year ago

Hi Mitchell, I am still working on the cellsize function. I think I may provide some updates on the function this Wednesday or Thursday, and maybe arrange a time to discuss it with you. Meanwhile, I will put the work-in-process onto the package GitHub as well.

emitanaka commented 1 year ago

Just a note that we have a meeting scheduled on this Thu 4pm and @ZIYAOWANG123 , you have a presentation to give on Tue Oct 4 Brown Bag Session.

ZIYAOWANG123 commented 1 year ago

The cellsize function is to calculate the proper cell size used to create the hexagon grid, it contains two parts:

  • Obtaining the cell size:
  1. Calculate the total AU area (perhaps) by st_boundary;
  2. Calculate the total AU land area by sum(st_area(data));
  3. Obtain the ratio = Total AU land area / Total AU area;
  4. Use the knowledge of knowing the number of hexagons we need for land area, to obtain the number of hexagons for the total AU area;
  5. Based on the number of hexagons of the total AU area, determine the proper cell size used to generate the hexagon grid.
  • Using the cell size for a good hexagon grid result:
  1. A cell size will determine the number of hexagons for a given area (i.e. Total AU area)
  2. Need to know how cell size affects the number of hexagons for the area to control the number of hexagons to match the ratio from the last part step 3.

Please correct or add to my comment!

While I was setting up the cellsize function, the st_boundary output the boundaries of the sf object yet not the area (we wanted).

Here are two main questions:

  1. How to obtain the whole area of AU?

  2. After obtaining the number of hexagons for the whole area (i.e. using land/no. of hex_land = tot_area/no. of hex_tot), how to use this number of hexagons to alter the cell size in st_make_grid? (We want the cell size which can produce the right number of hexagons.)

Additionally, I have checked the source codes of function st_make_grid to see how actually cellsize works in it. It shows a "strange" way of using it (c(diff(st_bbox(x)[c(1, 3)]), diff(st_bbox(x)[c(2, 4)]))/n, where x = data set; n is an indicator with a default of n = c(10, 10)). But, I didn't find out how to use the known/wanted number of hexagons (i.e. the output of st_make_grid) to adjust/alter the cellsize parameter.

Any suggestions and help will be appreciated! @mitchelloharawild @emitanaka