Closed TimTaylor closed 4 years ago
Thanks for this @tjtnew,
Did you have luck determining if this was a feature of the data or due to our processing issues?
Not looked at this since raising the issue.
It looks like a processing issue from my inspection, inside get_authority_lookup_table. I think, e.g., the upper_tier_auth and ni_auth have the same region2 information in them, and when the rows are combined the region2 information then appears twice.
authority_lookup_table <- get_authority_lookup_table()
authority_lookup_table %>% group_by(region_level_2) %>% tally() %>% filter(n > 1)
# A tibble: 15 x 2
region_level_2 n
<chr> <int>
1 Antrim and Newtownabbey 2
2 Ards and North Down 2
3 Armagh City, Banbridge and Craigavon 2
4 Belfast 2
5 Causeway Coast and Glens 2
6 Derry City and Strabane 2
7 Dumfries and Galloway 2
8 Fermanagh and Omagh 2
9 Fife 2
10 Highland 2
11 Lisburn and Castlereagh 2
12 Mid Ulster 2
13 Mid and East Antrim 2
14 Newry, Mourne and Down 2
15 Powys 2
This duplication can be removed by going:
authority_lookup_table <- authority_lookup_table %>% dplyr::arrange(level_1_region_code) %>% dplyr::distinct(level_2_region_code, region_level_2, .keep_all = TRUE)
Note the arrange is required to ensure the NA level_1_region_codes are sorted to the bottom so the distinct step removes them.
Happy to submit this as a pull request if you like; please let me know.
Thanks very much for looking at this and identifying exactly where the error is @rboyes . I hadn't checked back on this code in a while, great to have someone else look at it. This should now be fixed in master, using your code (plus a very slightly cleaner lookup process).
@rboyes - I should have mentioned this earlier, we will add you as a package contributor (#83) unless you let us know otherwise. Thanks again